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ABSTRACT 


This report is concerned with design and evaluation of a 
microprocessor based high speed space-borne packet switch. Three 
designs namely, a single, three and multiple processor designs 
are presented. System architectures for these three designs are 
presented. Further, the hardware circuits, and software routines 
required for implementation of the three and multiple processor 
designs are also presented. A bit-slice microprocessor is used. 
This processor has been designed and microprogrammed. Maximum 
throughput has been calculated for all three designs. Queue 
theoretic models for these three designs have been developed and 
utilized to obtain analytical expressions for the average waiting 
times, overall average response times and average queue sizes. 

From these expressions graphs have been obtained showing the 
effect on the system performance of a number of design parameters. 
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1.0 INTRODUCTION 
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In the past decade packet switching has revolutionalized 
data communication. In I960 virtually all interactive data 
communication networks used circuit switching, which is the 
current technology used in telephone networks [1] . Circuit 
switching networks preallocate channel bandwidth for an entire 
message. However, since most interactive data traffic occurs 
in short bursts, a large portion of the bandwidth is wasted. 
Thus, as digital electronics became inexpensive and the need 
for more digital data communication networks grew as computer 
technology expanded, the redesign of data communication net- 
works became economically feasible and desirable. Packet 
switching was introduced since it allows for the dynamic allo- 
cation of bandwidth, which permits users to share the same 
transmission line previously assigned to only one user. 

Packet switching has improved the economics of data communi- 
cation systems, network reliability and functional flexi- 
bility [1] . 

Packet switching networks divide the users ' messages into 
small segments, or packets, of data which move through the 
network towards their destination. All packets are fixed- 
length and serial in structure. Packets consist of a header 
and a body. The header, which precedes the body, contains 
the routing control information which indicates the packet's 
source and destination. In addition, the header also con- 
tains message reconstruction information for use at the des- 
tination. Since a complete message may occupy more than one 
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packet, each header contains a message number and a packet 
sequence numLex. Thus, any packets arriving in a scrambled 
sequence can be rearranged to correctly yield the entire 
message received. The body of a packet contains the data 
being transmitted. The length of each packet within a net- 
work is fixed for the entire system. 

The routing of these packets is handled by the packet 
switches implemented in the network. These special switches 
replace the previous circuit switches found in telephone net 
works and older data communication networks. The scope of 
the work presented in the following chapters consists of the 
design and evaluation of these packet switches using micro- 
processors to control the switching functions. 


I l.Jt Problem Definition 

n 

i 

\ This report examines the problem of designing and evalua- 

l 

ting multiprocessor-controlled packet switches. (The design 
and evaluation of a single processor version is presented in 
[2,3].) The work presented in the following chapters will investi- 
gate the question of how large a multiprocessor packet switch can 
be constructed before the problem of resource contention erodes 
the system's performance. The performance of these multiprocessor 
designs will be evaluated in terms of their maximum throughput with 
respect to the number of users and the number of processors imple- 

i 

I 

| mented, average delay within the switch, and queue sizes. 

These packet switches must be capable of routing packets 
^ among any number of up to several hundred users. In addition, 

i 

i 
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all designs must allow the use of these packet switches in 
communication satellites as well as in networks using only 
land lines. The problems of protocols and error-correction 
codes are briefly reviewed in this work. 

1.2 Approach to the Problem 

System design considerations are examined first. These 
considerations include protocols, prior work, workload divi- 
sions and resource contention among processors. A review of 
protocols and their effects on throughput is presented. Using 
the information from this investigation of protocols, a deci- 
sion is made on how to handle this problem. 

After the protocol problem is solved, a review of the 
prior single processor design is presented. Using the prior 
design as a foundation, the requirements and goals of the 
multiprocessor designs are formulated. A review of the prior 
design at the functional level allows the workload division 
for the three processor design to be made. 

Once the workload division is made, the contention pro- 
blems relating to the shared resources are investigated. In 
this investigation, each shared resource is identified and 
their specific contention problems are examined. Various 
solutions to these problems are found and presented. 

Once all the design considerations that influence the 
actual implementation are examined, the system architecture 
of the three processor packet switch is designed. The design 
of the architecture, its operation and functional requirements 
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allows the detailed design of the system hardware to be 
completed. 

Once the system hardware is designed, the processors 
and their software requirements are defined and designed in 
detail. 

After the design of the three processor system is com- 
pleted, the same design procedure is repeated for the design 
of the multiple processor packet switch. 

With both designs complete, an evaluation of each system 
is carried out. The evaluation determines the maximum through- 
put of both architectures. A queue theoretic model is 
developed that facilitates analysis of delay and queue sizes 
within the packet switch. 


i 
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2.0 SYSTEM DESIGN CONSIDERATIONS 

The final architectural designs of the multiprocessor- 
based packet switches are influenced by several system de- 
sign constraints and goals. Some of these are considered in 
the single processor architecture [2,3]. Thus, those par- 
ticular considerations will be reviewed briefly in this 
chapter. The remaining design considerations arose directly 
from the use of multiple microprocessors, and shall be dis- 
cussed in detail. The review of each design constraint and 
design goal will lend an explanation to the approach taken in 
the development of the new system architectures. 

2.1 Protocols 

Much attention was given to the analysis of various pro- 
tocols and their effects on the packet switch in the previous 
work [2,3]. Implementation of a full forward error correc- 
tion (FEC) scheme, an End-to-End Automatic -Repeat-Reguest 
(ARQ) scheme, and an Up- Link ARQ scheme were considered. The 
results of this research were used to select a protocol scheme 
for the multiprocessor architectures. 

Since large system throughput is a major goal, any proto- 
col which was shown to reduce system throughput was eliminated 
from further consideration. 

A reduction in throughput was found to be linked to all 
protocols requiring the packet switch to maintain special 
software. Thus, only protocols which are transparent to the 
packet switch will be supported by the multiprocessor 
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architectures. Two protocols which fulfill this requirement 
are the FEC . scheme and the End-to-End ARQ scheme. 

In addition to improving system throughput, "transparent" 
protocols offer the users flexibility. Users can custom tailor 
protocols to meet their needs. Transparent protocols could be 
changed or altered even after the network is completed and 
operational. Also, different protocols could be implemented 
between different users in the same network. 

2 . 2 Packet Construction 

The packet format consists of a body and a header. Pac- 
kets are serial in structure with the header preceding the 
body. The body length of a packet is fixed for a given system. 
However, the selection of this length is generally made from 
a range of 256 bits up to 10240 bits. In order to maximize 
the throughput of the multiprocessor-based packet switches 
under investigation in this report, the recommended body length 
is 10240 bits. 

The packet header contains information required to route 
the packets to their proper destinations. In addition, the 
header also contains special information needed by the destina- 
tion. Since entire messages may exceed the length of a single 
packet, they must be divided into packet-length segments before 
transmission. The last packet of a message will be "padded" 
with blank characters to fill unused bits in the packet 
should the message not require an integer number of packets. 

The special header information is used by the destination 
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to reconstruct entire messages which have been sent via 
( J several packets. Therefore, should the packets arrive in a 

scrambled sequence, the message is still recoverable. This 
information and the routing information is arranged in vari- 
ous fields. These fields contain, in coded form, the packet's 
source, destination, message number and sequence number . 

Since the header information is vital for proper packet 
transmission, its protection is a system design requirement. 
Thus, the header is protected by an error-correcting code. 

The Bose-Charedhuri-Horquenghen (BCH) code was chosen for this 
task in the original design and is implemented again in the 
multiprocessor systems. The cost of this protection is in- 
creased hardware and software for both the system users and 
the packet switch. However, this increased overhead has been 
deemed necessary in order to maintain the integrity of the 
network. The header is the only part of the packet which has 
error-correction protection that is used directly in conjunc- 
tion with the packet switch. Error protection of the packet 
body is optional to system users and must be implemented at 
the ground stations. 

2.3 The Prior Architecture 

There are several important design philosophies which 
have shaped the architecture of the packet switch. They are 
incorporated in the multiprocessor architectures as well as 
in the single processor architecture. The following are the 
design guidelines used for all architectures: 
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1) A fixed packet length must be used by the network. 

This simplifies hardware and software requirements. 

2) All packet transfers through the switch are done 
serially. This eliminates any need for Serial- to- 
Parallel and Parallel-to-Serial conversions. (Al- 
though all packet transfers are done serially , the 
processor accesses the header in parallel.) 

3) Since all packet transfers are serial, this operation 
is to be managed by dedicated hardware. Processor 
control of this function would decrease system through- 
put due to the comparatively slow speed of software. 

In addition, the use of dedicated hardware v o perform 
this task allows the processor to spend more time 
making decisions and controlling other system opera- 
tions. 

4) The full capacity of the processor must be utilized 
to avoid throughput reduction. This goal is achieved 
by reqv Iring that the processor never wait for hard- 
ware. This requires parallel hardware for certain 
functional blocks. These blocks are initiated into 
action by the software. This hardware completes its 
assigned task automatically without further software 
supervision. All architectures permit several simul- 
taneous operations to Le performed, since the processor 
is free to move on to new tasks once the hardware is 
activated . 
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5) To further increase system throughput, the processor 
is only allowed to access the header of each po:ket. 
While in the switch, the packet bodies are left un- 
touched by the processor. Since the routing informa- 
tion needed by the processor is found only in the 
header, this design goal is easy to implement. 



The final system architecture of the single processor 
packet switch it; presented in Figure 2.1. This packet switch 
handles N usere who are allocated one line each. Operation 
of the system consists of each user transmitting their packets 
to the rv.itch which routes the packets to the proper destina- 
tion. The- packets arrive at the switch as serial bit streams. 
The switch is configured such that any user may communicate 
with any other user in the network. 

The routing of the users' messages begins with the 
buffering of all incoming packets. Each input line is double 
buffered. Even with double buffering, the processor service 
response time must be short. Buffer overflow will destroy 
packets left toe long in a buffer. In order to avoid packet 
losses, a Minimum of processing is done at the input buffers. 

As 'oon .-B a full buffer is detected, the processor immediately 
stores the packet in temporary storage. This storage area is 
constructed of shift registers arranged in an array. 

Once stored in the shift register array, each packet 
receives additional service. Their headers are decoded by 
the processor to determine each packet's destination. The 
routed packets are assigned to software output queues. Use 


9 



Single Processor System Architecture 
(Courtesy of James Burnell) 













of software queues eliminates the need for additional packet 
transfers required by hardware queues. Each queue corresponds 
to one unique output buffer. 

When an output buffer becomes empty, the processor 
accesses the associated queue for the next packet awaiting 
transmission. Each queue contains the location of each routed 
packet in the array awaiting transmission to that queue's 
corresponding output buffer. Using this information, the 
processor begins the transfer of the queue's oldest packet to 
the proper buffer. Once in the buffer, the packet is then 
transmitted onto the network channel under hardware control. 

The software required to control the packet switch con- 
sists of three routines: The input service routine, the back- 

ground service routine and the output service routine. 

The input service routine is interrupt driven. Execution 
of this routine begins when the Data Available (DAV) line of 
an input buffer becomes active and is detected by the input 
interrupt polling circuit. Equal priority among all users is 
ensured by the sequential scanning of these DAV lines. 

The first task of this software is the linking of a free 
data path in the input switching network to the full buffer. 
Next, the address of an empty shift register is fetched from 
the Empty Shift Register List (ELIST) . This shift register 
is then linked to the full buffer via the data path. Finally, 
the processor initiates the packet’s transfer into the array. 
This routine has the highest priority and is unintcrruptable. 
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The background service routine continually scans the 
shift register array in search of packets requiring service. 
Upon finding one, the processor fetches the header. The header 

is corrected, if necessary, by using error pattern data stored 

• * 

in a Syndrome Decoder ROM. Meet, the packet's destination is 
determined. The packet's address in the array is then placed 
ii the proper output queue list. However, if this list is 
empty and its corresponding buffer is also empty, the processor 
will load the packet directly into the buffer. The packet's 
array address will then be placed in ELIST. This routine has 
the lowest priority since it is not interrupt driven. 

Like the input routine, the output service routine is 
interrupt driven. Detection of an empty output buffer by the 
output interrupt polling circuit forces the execution of this 
software package. This routine must first check the output 
queue associated with the buffer requesting service. If this 
list is empty, the service request "flag" for this particular 
buffer is reset and the processor exits from this routine. 
However, if the queue is not empty, the processor then fetches 
the array address of the oldest packet in the queue. Using 
this address, the processor then links the proper shift regis- 
ter to the empty buffer. This link is established via an 
available data path in the output switching network. Once 
the link is complete, the data transfer begins. This routine 
has the second highest priority. 



2.4 Processor Workload Divisions 

In most multi-microprocessor systems, the primary design 

goal is the identification and separation of all tasks which 
are relatively independent {4 ]. ideally, this allows each 
processor to perform a dedicated task. Thus, each processor 
can operate mostly independently of the others. As a result, 
very little data needs to be exchanged among processors rela- 
tive to the total system data flow. 

This design philosophy is implemented in the determination 
of the processor workload division for the multiprocessor- 
based packet switches. The first step in the implementation 
is the identification of each "independent" task. A review 
of the single processor design shows that the operation of the 
packet switch consists of three major tasks. Each of these 
are controlled by independent software routines. The three 
tasks are: 

1) Storage of received packets (Input Function) 

2) Routing of each received packet (Routing or Background 
Function) 

3) Transmission of each routed packet (Output Function) 

Now that the "independent" tasks have been identified, 
the workload division can be made; one processor is assigned 
to each of the three tasks. The architecture supports an 
Input Processor, a Routing Processor and an Output Processor. 
Each processor supervises dedicated hardware, executes custom 
software and shares a minimum amount of common resources. 


Resource sharing presents many problems and is the next topic 

Ot discussion. 

% 

» 

2.5 Resource Contention Among Processors 

In most multiprocessor systems, shared resources are 
necessary. Unfortunately, they present many control problems 
and may cause reduced throughput. Therefore, they must be 
kept to a minimum. 

Concern over shared resources arises whenever the possi- 
bility of processor contention exists. Contention occurs when 
two or more processors simultaneously request access to the 
same resource . This is known as a race condition [ 5 ] . Con- 
tention also occurs when one or more processors request access 
to a resource currently in use by another processor. 

A system's throughput can be severely reduced by conten- 
tion in two ways. Simultaneous access of a resource by two 
or more processors will cause havoc in the system. Therefore, 
special hardware and/or software is required to schedule re- 
source allocation. Only one processor must be granted access 
to a particular resource at any given time. This requires 
that the other processors be "locked out." Implementation of 
any resource locking scheme requiring special system software 
will reduce throughput. In addition, processors which become 
"locked out" are forced to wait for the busy resource. Pro- 
cessor idleness due to contention reduces throughput. 

Since increased throughput is the primary goal in the 
design of a multiprocessor packet switch, contention must be 


minimised. This goal is achieved by first identifying each 
shared resource. The following is a list of shared resources 
compiled from a review of the system architecture: 

1) The shift register array 

2) ELIST 

3) The output queue lists 

4) The Output switching network 

5) The output buffers 

An analysis of contention problems. for each of these re- 
sources is now needed. 

All three processors use the shift register array. Each 
shift register must assume one of the following states: 

1) Empty 

2) Holding an unserviced packet 

3) Holding a routed packet 

4) Shifting out or in a packet in transit 

Empty shift registers with their addresses in ELIST can 
only be accessed by the Input Processor. Shift registers con- 
taining unserviced packets can only be accessed by the Routing 
Processor. The Output Processor can only service shift re- 
gisters containing routed packets. Thus, any shift register 
in one of these three states is free from contention problems. 

However, shift registers containing packets in transit 
from the array to the output buffers present a contention 
problem. As stated earlier, once a packet transfer is 
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initiated by a processor, dedicated hardware takes control. 
Therefore, the processor is now free to start a new task. In 
the case of the output processor, the next task is the up- 
dating of ELIST with the address of the packet in transit. 

ELIST now contains the address of a shift register who's con- 
tents are only partially transferred. A resource contention 
could occur if the Input Processor uses this location to store 
a new packet. 

Two solutions to this problem exist. One solution is to 
require the Output Processor to temporarily hold the address 
of each packet in transit. This scheme needs hardware to sig- 
nal the completion of transfers, address storage and additional 
control software. Since additional software reduces through- 
put, this scheme is not used. 

Instead, the scheme used requires the array hardware to 
allow the simultaneous transmission of an old packet and the 
storage of a new packet at the same location. Although the 
shift register array is a shared resource, contention pro- 
blems have been avoided. 

The shared resources remaining to be examined all have 
one thing in common: Each resource is accessed by the Routing 

Processor. However, only the output queue lists are accessed 
by this processor under normal operation. The Routing 
Processor only requires access to ELIST, the output buffers 
and the output switching network when a special event occurs. 
This event takes place whenever the Routing Processor finds 
a packet destined to an output buffer which is empty and who's 



output queue list is also empty. The Routing Processor re- 
sponds by transmitting the packet directly. 

In order to deal with this one special operation, allo- 
cation of many shared resources is required. This increases 
the risk of reduced throughput due to contention. In addition, 
throughput will be reduced by the system software required to 
manage the resource allocations. Therefore, a decision must 
be made whether or not to allow the Routing Processor to 
transmit packets as done previously by the background routine 
in the single processor design. 

Since system throughput is at stake, the Routing Processor 
must not be permitted to transmit packets . Although a new 
scheme must be devised to handle this special event, conten- 
tion has been completely eliminated from the output buffer 
system and the output switching network. (Specific details 
on the new scheme are presented next in the contention analy- 
sis of the Output Queue Lists.) These resources are now solely 
controlled by the Output Processor. In addition, ELIST is now 
only accessed by the Input Processor and the Output Processor. 

Each output queue list is associated with one unique 
output buffer. These lists contain the addresses of routed 
packets in the array awaiting transmission. The Routing Pro- 
cessor must access these lists to update them with the 
addresses of newly routed packets. Meanwhile, the Output 
Processor must access the lists to find the next packet re- 
quiring transmission. 

Since the Routing Processor always writes to the lists 
while the Output Processor always reads from the lists, dual 
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port RAM's can be used t 6 ]• Dual port RAM's permit two pro* 
cessors to access them simultaneously provided at least one 
processor performs a read operation. Only one processor is 
allowed to perform a write operation at one time. 

The data structure of the output queue lists is designed 
such that if the processors are accessing the same location 
the queue is considered empty. Only after the Routing Pro- 
cessor updates the list and moves on to the next location can 
the output processor read from the once empty list. Thus, 
the situation of a concurrent read and write operation at a 
single location is avoided. At first glance, the problem of 
contention appears to be solved. However, further investiga- 
tion is needed to ensure that this is true. 

A new output buffer, status word with three states, "busy," 
"empty," and "idle" is now used. An output buffer in the busy 
state is in the process of receiving a packet from the array, 
receiving output processor service or transmitting a packet. 
Once a packet is transmitted, the buffer enters the empty 
state which indicates the buffer requires service. An output 
buffer is placed in the idle state by the Output Processor 
when the buffer becomes empty if its queue list is also empty. 

When the Routing Processor encounters a packet destined 
for an idle buffer, it must first update the buffer's queue 
list. Then the processor must change the buffer's status 
word to indicate the buffer is empty and requires service 
from the output processor. This operation replaces the prior 
scheme of transmitting packets directly. As stated earlier. 


many contention problems are eliminated by this new scheme. 
However, a new subtle problem has arisen. 

Table 2.1 lists the sequence of events which leads to 
the problem. The first line in the table shows that an empty 
output buffer is receiving processor service. The buffer's 
output queue is currently empty. The Output Processor is pre- 
sently accessing this queue. Meanwhile, a packet destined for 
this buffer is being routed by the Routing Processor. The 
Routing Processor has just read the status word of this buffer 
which indicates the buffer is empty. However, just after the 
status word was read, the Output Processor updated it to cor- 
rectly indicate that the buffer is idle. Line two in the 
table now shows the new status word. In addition, the Routing 
Processor, acting on incorrect information, has placed the 
packet's address into the queue list without updating the 
status word. Line three in the table displays the packet's 
address residing in the queue list while the status word still 
indicates the buffer is idle. Since the Output Processor can 
only service buffers in the empty state, this packet is 
trapped in the system. This packet will remain trapped until 
a new packet arrives for the same destination. The remaining 
lines in the table depict the events leading to the recovery 
of the trapped packet. Since the recovery time may be quite 
long, a solution to this problem must be found. 

In order to solve this problem, a locking scheme is im- 
plemented for the output queue lists and the associated output 
buffer status words. Any time one processor gains access to 
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one of the queue lists, the other processor is locked out from 
that particular queue and its associated buffer status word. 
Thus, when a processor is granted access to these resources, 
the processor is ensured of obtaining correct data. Although 
the queue lists are never accessed by more than one processor 
at one time, the dual port RAM's are still used since they 
simplify other hardware and software requirements. The locking 
scheme requirss some additional hardware and software. Some 
processor idleness may also be encountered. However, although 
sane system throughput is sacrificed, the packet switch's 
integrity has been preserved. 

The ELIST is the last shared resource to be analyzed for 
contention problems. ELIST is shared by both the Input Pro- 
cessor and the Output Processor. The Input Processor must 
access this list to find available empty shift registers in 
the array. The Output Processor updates this list with the 
addresses of shift registers released by transmitted packets. 

As in the case of the Output Queue Lists, one processor always 
writes to ELIST while the other always reads from ELIST. 
Therefore, ELIST can be built using dual port RAM's. These 
RAM's allow a simultaneous read operation and write operation 
to take place at different locations without interference. 

The data structure of ELIST is designed such that no 
simultaneous read and write operations can be performed at 
the same location unless the list becomes empty. If the list 
becomes empty, the system faces a far graver problem than con- 
tention. However, ELIST should never become empty under 


normal operation. Thus, another shared resource is spared 
from contention problems, since the processors using it are 
transparent to one another. . 

In summary, the output queue lists are the only shared 
resources which face contention problems. Details concerning 
how this problem is handled are found in 3.1.5, 3.3.2, and 
3.3.3. 
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3.0 THE THREE PROCESSOR DESIGN 

The system architecture of the three processor packet 
switch is presented in Figure 3.1. As in the single processor 
design, this packet switch handles N users who are allocated 
one line each. Again, the switch is configured such that any 
user may communicate with any other user in the network. Al- 
though the workload is divided among three processors, the 
function of the switch remains unchanged from the original 
design. Thus, a detailed description of the packet switch's 
operation is not presented. Instead, this chapter focuses on 
the actual hardware, software and processors required to imple- 
ment this new architecture. 

3 . 1 System Hardware 

Under processor control, the system hardware carries out 
the assigned tasks of the packet switch. Since the packet 
switch architecture now supports multiple processors, a new 
control signal labelling scheme has to be adopted. This 
scheme is designed to help eliminate any confusion regarding 
the source and destination of each control signal. Table 3.1 
contains each control signal code format with an example and an 
explanation. Detailed explanations of the circuits and their 
operations are presented in the following sections. 

3.1.1 The Input Buffers 

Displayed in Figure 3.2 is the circuitry required by 
an input buffer for one user. All received packets remain in 
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Fig. 3.1 Three Processor System Architecture 
















Fig. 3.2 Input Buffer for One User 














the input buffer until they are transferred into the shift 
register array. In order to reduce the possibility of over- 
flow* each input channel is double buffered. The two buffers 
at each input channel are packet-length shift registers. Buf- 
fer select logic determines which buffer is to be linked to 
the input channel while the other buffer is linked to the 
Input Switching Network. This select logic is driven by a 
packet counter. The packet counter monitors the arrival of 
packets by counting each bit. Once an entire packet has been 
received* the counter rolls over* activating the buffer select 
hardware. The select logic then switches the buffer assign- 
ments. Concurrently* the counter sets the Data Available 
(DAV) flag indicating a full input buffer. This flag is 
scanned by the Input Buffer Polling Circuit* which is the 
next topic presented. 

3.1.2 The Input Buffer Polling Circuit 

The Input Buffer Polling Circuit appears in Figure 3.3. 
This circuit sequentially scans each input buffer's DAV flag 
searching for a full buffer. A counter, which cycles through 
N values, drives the poller. The counter's output is supplied 
to the DAV multiplexer (MUX) . Selection of one of the N MUX 
inputs is controlled by the counter's value. Each of the MUX 
inputs is a DAV signal from an input buffer circuit. The 
selected DAV signal is passed onto the Stop Scan flip-flop. 
When an active DAV signal is encountered* the Stop Scan flip- 
flop is set. Once set* this flip-flop halts the counter. 
Simultaneously* it brings the Input Buffer Service Request 
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Buffer Polling Circuit 






* 


r 

L 

r 

5 ' 

f 

; ■ 

t 


( 







(XBSR) line high. Informing the Input Processor that a full 
buffer has been found. 

' The stabilized counter value represents the address of 
that full buffer. This value is sent to the Input Processor 
for processing. In addition# the counter's output is supplied 
to the Flag Reset Demultiplexer (DEMUX) . This DEMUX allows 
the processor to send the A- RESET (see Table 3.1) signal to 
clear the proper buffer DAV signal. The A-RESET signal also 
clears the Stop Scan flip-flop# thus restarting the polling 
circuit. 


3.1.3 The Input Switching Network 

The Input Switching Network can provide a programmable 
data path between any input buffer and any location in the 
shift register array. This network consists of multiple# 
programmable data paths permitting the system to handle simul- 
taneous packet transfers. A single data path is illustrated 
in Figure 3.4. 

In order to establish a complete data path in the net- 
work# the Input Processor must first place the address of the 
input buffer being serviced into Latch A. The contents of 
this latch are supplied to the Data Mux and the Input Buffer 
Shift Clock Demux. The Data Mux links the selected input 
buffer to the switching network. The Shift Clock Demux sup- 
plies the shift clock to the selected input buffer. Once this 
half of the data path is established, the Input Processor sends 
the address of the empty shift register to Latch B. This 
latch provides the Data DeMUX, the Shift Register Array Clock 
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Demux and the Status Demux with the select lines needed to 
complete the data path. The Data Demux links the selected 
shift register to the network, completing the actual data 
path for the packet transfer. The Shift Register Array Clock 
Demux supplies the shift clock to the shift register. The 
function of Status Demux and its associated hardware is ex- 
plained in 3.1.4. 

When a data path has been completed, the Input Processor 
initiates the packet's serial transfer through the data path 
by clearing the Stop Transfer flip-flop. This flip-flop halts 
the packet transfer once the packet counter rolls over. The 
packet counter counts each bit of the packet in transit by 
monitoring the shift clock pulses. This scheme permits every 
packet transfer into the array to be hardware terminated. In 
addition to halting the packet transfers, the Stop Transfer 
flip-flop generates the Data Path Busy signal. This signal 
indicates the status of the data path, which can be either 
idle or busy. Each Data Path Busy signal is sent to hardware 
which provides the Input Processor with the address of a free 
data path. Figure 3.5 is a diagram of this hardware circuit. 

3.1.4 The Shift Register Array 

The function of the Shift Register Array is to provide 
temporary storage for received packets which are waiting to 
be routed and transmitted. A single location in the array is 
shown in Figure 3.6. 

As packets arrive from the input buffers, they are shifted 
into the shift registers in the array. Each location actually 
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Location in the Shift Register Array 








uses three shift registers linked together to form the packet- 
length storage area required. As the packets are transferred, 
the header arrives first and eventually resides in the Packet 
Header Shift Register. Unlike the shift registers which con- 
tain the actual packet data bits and the header error protec- 
tion bits, this shift register allows parallel accessing of 
the header. Since the processor fetches each packet header 
and also returns the corrected header to the shift register, 
the parallel access feature is a system requirement. 

As packets are sent to the array, their headers and header 
correction bits are also sent to the Syndrome Generator [ 3 ] . 
Each shift register location has its own Syndrome Generator. 
This hardware circuit decodes the header information into a 
syndrome. A non-zero syndrome indicates an error in the 
header data. The syndrome is available to the Routing Proces- 
sor which corrects the header using this error pattern infor- 
mation. 

All packet transfers into the array from the input buf- 
fers are hardware terminated. When the Stop Transfer flip- 
flop in the Input Switching Network is set, it halts the 
packet transfer hardware. In addition, the activated flip- 
flop is sent to the input of the Status Demux. This Demux 
passes the flip-flop's signal onto the selected shift regis- 
ter's Status flip-flop. The activated signal sets the Status 
flip-flop indicating that a packet transfer has been completed 
and that this location in the array now contains a packet re- 
quiring service. Every array Status flip-flop is scanned by 
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the Shift Register Polling Circuit, which appears in Figure 
3.7. This poller searches for unserviced packets, notifies 
the Routing Processor once one is found, and supplies the 
array address of the unserviced packet to the processor. A 
set Status flip-flop is cleared once the Routing Processor 
accesses the Syndrome Generator at that location. Next, the 
poller is restarted by the processor. 

Previously, the polling of the array was carried out by 
the processor. This scheme required additional software and 
consumed processor execution time even when empty locations 
were scanned [ 3 ] . Thus, the proposed use of a hardware poller 
increases the Routing Processor's throughput. 

3.1.5 The Output Queue Lists 

The Output Queue Lists are the software lists contain- 
ing the shift register array address of each routed packet 
awaiting transmission. Each list contains the addresses of 
routed packets destined for that list's associated output 
buffer. The Routing Processor always writes to the lists, 
adding the addresses of newly routed packets. Meanwhile, the 
Output Processor always reads from these lists, fetching the 
next packet to be transmitted. The lists are organized in a 
First-In-First-Out (FIFO) format, resulting in the transmis- 
sion of the oldest packet in the selected list. Figure 3.8 
contains the data structure of the Output Queue Lists. 

The index pointer or "Input Pointer" (IPTR) used by the 
Routing Processor points to the next address to be filled. 

Once a location is filled, the Routing Processor updates the 
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IPTR by incrementing it. When the IPTR reaches the end of 
the list, it rolls over, returning to the top of the queue 
list. The index pointer or "Output Pointer" (OPTR) used by 
the Output Processor points to the next location to be read. 

In order to fetch the next address from a list, the Output Pro- 
cessor must perform the read operation and then it must incre- 
ment the OPTR. 

The data structure of the lists is designed such that 
when IPTR is equal to OPTR the list is assumed to be empty. 
Under special circumstances, this assumption may cause packet 
losses. This problem is explored further in Chapter 5. 

In the single processor design, the output queue lists 
are stored in local RAM and the index pointers are stored in 
the processor's register file. However, in the multiprocessor 
environment of the new designs, this scheme no longer meets 
system demands. Both the Routing Processor and the Output 
Processor must access those lists. Therefore, the Output 
Queue lists must be stored in RAM's that are available to 
both processors. In order to reduce contention problems, 
each list is stored in a physically different RAM structure. 
This permits the two processors to simultaneously access dif- 
ferent lists without interference. Special locking hardware 
is required to prevent simultaneous access of one RAM should 
the processors fail to access different lists. As mentioned 
earlier, the RAM's used are TWO-PORT RAM's. The logic dia- 
gram of the AM29705 chips used is presented in Figure 3.9 (61. 
Several chips can be arranged to form a RAM structure of re- 
required width and length. 
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An additional constraint in the design of the queue lists 
is the requirement that the value of each index pointer be 
available to hardware test logic. The function of the test 
logic is to notify the Output Processor when an output queue 
list becomes empty (OPTR = IPTR) . Fulfilling this requirement 
results in the storing of all queue list index pointers in 
hardware counters. Figure 3.10 is the logic diagram of one 
Output Queue List structure. The operation of this circuit 
is best explained by tracing the procedure followed by the 
Routing Processor and the Output Processor as they access a 
queue list. 

Once the Routing Processor has determined the destination 
of a packet/ it activates the yP4-B (see Table 3.1) control 
line which selects the desired Output Queue List. These con- 
trol lines, when activated, enable the selected RAM, the 
associated locking circuit, and the IPTR updating circuit. 
Next, the processor places the shift register array address 
into the Output Queue List Data Port. The Routing Processor 
then activates the B-REQUEST lines to request access to the 
queue list. This control signal is sent to all the queue 
lists, but is enabled only at the queue list selected by the 
HP4-B signal. 

If the selected queue list is available, the B-REQUEST 
signal sets the WRITE ACCESS CONTROL flip-flop. This flip- 
flop then activates the READ LOCK-OUT line, which disables 
the READ ACCESS CONTROL flip-flop. Disabling this flip-flop 
locks out the Output Processor from this list. In addition. 
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Fig. 3.10 One Output Queue List 
















the set WRITE ACCESS CONTROL flip-flop activates the WRITE 
signal, which enables the RAM in the write mode. The address 
data latched in the Output Queue List Data Port is then strobed 
into the RAM location selected by the IPTR. The IPTR counter 
has as many unique values as the RAM has locations. The 
Routing Processor is informed of a completed write operation 
by the STATUS-B signal, which goes low when the WRITE ACCESS 
CONTROL flip-flop is set. Upon receiving the active- low 
STATUS-B signal, the Routing Processor reads the associated 
Output Status Word (OSW) . (The function and operation of the 
OSW is discussed in 3.1.8.) The reading of the OSW before the 
release of the queue list is required since access to the OSW 
is also controlled by the queue list lock hardware. Thus, 
only one processor can access both the queue list and the 
associated OSW. Once the OSW read operation is performed, 
the Routing Processor generates the B-RELEASE signal. This 
signal clears the WRITE CONTROL ACCESS flip-flop. Clearing 
this flip-flop frees the list since the READ LOCK-OUT signal 
and the WRITE signal are de-activated . In addition, the B- 
RELEASE signal activates the IPTR UPDATE signal which incre- 
ments the IPTR counter. 

If the Output Queue List selected is locked by the Output 
Processor, the B-REQUEST line is disabled by the WRITE LOCK- 
OUT line. The Routing Processor is informed of its denied 
access via the STATUS-B line, which remains high after the 
access request. The action taken by the Routing Processor in 
this event is discussed in 3.3.2. 
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The Output Processor must access an Output Queue List 
each time it services an empty output buffer. In order to 
aocess the queue list associated with the output buffer being 
serviced, the Output Processor must first activate the proper 
tiPl-C line. The activated yPl-C line enables the seclected 
RAM, the associated locking circuit and the OPTR updating 
circuit. Next, the Output Processor generates the C-REQUEST 
signal. If the selected queue list is locked by the Routing 
Processor, the C-REQUEST line is disabled by the READ LOCK- 
OUT line. The active-low STATUS-C line will remain high after 
the request, notifying the Output Processor of its access 
denial. The action taken by the Output Processor is discussed 
in 3.3.3. 

If the requested queue list is available, the enabled C- 
REQUEST signal will set the READ ACCESS CONTROL flip-flop. 
Setting this flip-flop activates the WRITE LOCK-OUT, READ, 
and STATUS-C lines. The activated WRITE LOCK-OUT disables the 
B- REQUEST signal, locking the Routing Processor out from this 
queue list. Also activated is the READ signal, which places 
the RAM in the enabled read mode. In addition, the STATUS-C 
line goes low informing the Output Processor that access has 
been granted. Once access has bee,'' granted, the Output Pro- 
cessor checks the EMPTY-C line to determine if the list is 
empty. This line is driven by a comparator whose inputs are 
the values of IPTR and OPTR. If the two pointers are equal, 
the comparator activates the EMPTY-C line. 

If the list is empty, the Output Processor releases the 
list by generating the C-RELEASE line. The OPTR is not 
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incremental in this situation since the list is empty. Should 
the list not be empty, the Output Processor reads the packet's 
address from the location selected by the OPTR. Once the read 
operation is complete, the Output Processor activates the C- 
RELEASE line freeing the list. In addition, the activated C- 
RELEASE signal increments the OPTR. 

Should both the Routing Processor and the Output Processor 
request the same queue list simultaneously, a Default Circuit 
locks out the Routing Processor while granting access to the 
Output Processor. 

An important point to note about this component is that 
although the hardware implementation of the index pointers is 
a system requirement, the system is enhanced by this feature. 
The first benefit of this scheme is the reduction of software 
due to the decrease in index pointer management overhead. 

The second benefit is the reduced number of register files 
required by the processors since the index pointers are stored 
externally. This reduces the processor's complexity. An 
additional point about this scheme is that it can be imple- 
mented in single processor systems as well as in multiprocessor 
systems . 


3.1.6 The Output Switching Network 

Illustrated in Figure 3.11 is one data path in the 
Output Switching Network. As in the Input Switching Network, 
the function of this network is to provide the Output Processor 
with programmable data paths. These data paths are used to 
link shift registers in the array to output buffers. Packet 
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transfers through the switching network are processor ini- 
tiated and hardware terminated. There are multiple data 
paths allowing simultaneous .packet transfers. The circuitry 
required to monitor the status of all data paths in the 
switching network is presented in Figure 3.12. This circuit 
provides the Output Processor with the address of a free data 
path when one is needed. 

3.1.7 The Output Buffers 

The function of the output buffers is to receive 
packets transferred from the shift register array and to then 
transmit those packets to the external channel hardware. 

Packets arrive at a rate determined by the internal shift 
clock. Packets then leave at the rate maintained by the ex- 
ternal line clock. The logic diagram for one Output Buffer is 
gi\'en in Figure 3.13. 

The central component of the buffer is a packet-length 
shift register where the packets are stored. While the COUNT 
FINISHED line is inactive, the packet is shifted into the 
shift register by the internal shift clock. Meanwhile, the 
INHIBIT EMIT flip-flop remains set, disabling the external 
shift clock. Once the COUNT FINISHED line is activated by 
the Output Switching Network, the INHIBIT XMIT flip-flop is 
cleared. This action enables the external shift clock, which 
then begins to shift the packet onto the channel line. The 
packet counter monitors the complete transfer. As soon as 
the last packet bit is shifted out of the buffer, the counter 
rolls over. The carry out line from the counter sets the 
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INHIBIT XM1T flip-flop and generates the BUFFER EMPTY signal. 

O The BUFFER EMPTY signal is supplied to the output buffer's 

Output Status Word (OSW) . The function and operation of the 
OSW is presented as the next to*/i... 

3.1.8 The Output Status Words 

The Output Status Word (OSW) of an Output Buffer is 
hardware circuitry used to monitor and reflect the current 
status of the buffer. All OSW's are accessible to both the 
Routing Processor and the Output Processor. Each OSW is linked 
to an associated Output Queue List. Thus, just as in the case 
of the queue lists, only one processor may access a particular 
OSW at any given time. This scheme eliminates the possibility 
of one processor reading an OSW while the other processor is 
altering the same OSW. 

Each OSW indicates one of the three states that its 
corresponding output buffer is in. The three output buffer 
states are Busy, Empty and Idle. An output buffer is in the 
Busy state whenever it is receiving a packet, transmitting a 
packet or receiving Output Processor service. Output buffers 
enter the Empty state when the packets that they were trans- 
mitting are completely transferred onto the channel lines. 

The Output Processor places an empty output buffer in the Idle 
state when the corresponding Output Queue List is also empty. 
The hardware implementation of one OSW and the Output Buffer 
Polling Circuit used to scan each OSW is illustrated in 
Figure 3.14. 
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Output Status Word and the Output Buffer Polling Circuit 










The Output Buffer Polling Circuit sequentially scans 
O each OSW in search o- an empty buffer. When a buffer empties, 

its support hardware generates a BUFFER EMPTY signal. This 
signal sets the OSW's SERVICE REQUEST flip-flop. The acti- 
vated SERVICE REQUEST line is eventually found by the poller 
as it scans the CSW's. Finding an empty buffer, the poller 
signals the Output Processor and supplies the processor with 
the address of the empty buffer. 

The Output Processor then accesses the Output Queue List 
associated with the empty buffer. If the list is empty, the 
Output Processor updates the OSW to indicate that the buffer 
is in the Idle state. This update is done when the Output 
Processor generates the C-IDLE signal. (The proper yPl-C 
select signal is still enabled from the queue list access.) 

The poller is restated by the C-RESET signal. If the list is 
not empty, the C-SERVICE signal is activated to clear the 
SERVICE REQUEST flip-flop. This updates the OSW to indicate 
that the buffer now is busy. The poller is restarted by the 
C— RESET signal . 

Every time the Routing Processor updates an Output Queue 
List, it checks the corresponding OSW. If the OSW indicates 
that the buffer is not idle, the OSW is left unchanged. How- 
ever, if the OSW indicates that the buffer is in the Idle 
state, the Routing Processor updates the OSW to indicate that 
the buffer is empty. This update is accomplished when the 
Routing Processor activates the B-EMPTY signal. (The proper 
WP4-B select line is still enabled from the queue list access.) 
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3.1.9 The Empty Shift Register List 
) The Empty Shift Register List (ELI ST) contains the 

array addresses of every empty shift register in the array. 

Tli is list is read by the Input Processor and written to by 
the Output Processor. Figure 3.15 shows the data structure 
used to maintain the list. The index pointer (EPTR0) used 
by the Input Processor points to the next shift register 
address to be fetched. Once the address data is fetched, the 
Input Processor increments EPTR0. The index pointer (EPTR1) 
used by the Output Processor points to the last location 
updated with the address of a freed shift register. The Out- 
put Processor must first increment EPTR3 and then perform the 
write operation. 

This data structure is designed such that under normal 
operation, a Read and a Write operation will not take place at 
the same location. Thus, both processors can simultaneously 
access the list without interference. 

Illustrated in Figure 3.16 is the hardware circuit re- 
quired to implement the ELIST. As in the Output Queue List 
system, the pointers are implemented in hardware and the RAM 
is a 2-port RAM. Although this is not necessary, since 
neither processor requires access to the other's pointer, it 
does reduce software overhead. Since this increases through- 
put, this scheme is proposed over the previous scheme of 
storing the pointers in the register file. The use of hard- 
ware index pointers could also be used in a single processor 
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system by using cn up/down counter to hold a single index 
pointer. The ELIST data structure which supports the single 
index pointer is found in ( 2 ] and [ 3 ] . 

3.2 The Processors 

As stated earlier, there exist three classes of processors 
in this implementation of a packet switch. Although each pro- 
cessor's function is quite different, the actual processor used 
in each class is constructed around a similar architecture. 

The custom software executed by each processor and the blocks 
of unique support hardware are the two elements which give each 
class of processor its distinct character. As in the single 
processor design, the processors are built using the Advanced 
Micro Devices (AMD) 2900 family of bit-sliced processing com- 
ponents. The design considerations which led to the selection 
of these components are discussed in [ 2 ] and [ 3 ] . 

3.2.1 General Processor Architecture 

The architecture of all classes of processors is 
comprised of two functional blocks: The Microprogram Control 

Unit (MCU) and the Instruction Execution Unit (IEU) . The 
Routing Processor contains one additional functional block: 

The Syndrome Read Only Memory (ROM) which contains the header 
error correction information in a lookup table format. 

Figure 3.17 contains the block diagram of the processor archi- 
tecture . 
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3.2.2 The Instruction Execution Unit 

The Instruction Execution Unit (IEU) of the Input 
Processor and the Output Processor is presented in Figure 
3.18. Figure 3.19 shows the IEU of the Routing Processor. 

Both versions of the IEU incorporate the AMD 2903 four-bit 
ALU slices. Shown in Figure 3.20 is the block diagram of the 
AMD 2903 ALU chip. Cascading these chips in parallel will 
provide the required width of the processor word. The AMD 
2903 has been selected over the AMD 2901 ALU because the 2903 
architecture supports two Direct Data Inputs. The use of the 
second data input allows the data from the polling circuits 
to be directly supplied to the ALU. This reduces software 
overhead since a typical two instruction read operation is no 
longer required. Instead, the data is sent directly to the 
ALU during the execution of a single instruction. Since this 
scheme is implemented for each class of processors, a total 
of three memory cycles has been saved, improving throughput. 

All the arithmetic and logical operations required for 
address generation and data manipulation are carried out by 
the IEU. Inputs to the ALU are supplied by five different 
sources: The Input Bus (IBUS) , the Microprogram Word (yW) , 
Scratchpad 1, Scratchpad 2, and the polling circuit. The IBUS 
provides a data path from all external memory and data ports 
to the ALU. Immediate operands in the Control Memory are 
suppj.. a to the ALU via the yW input. Scratchpads 1 and 2 
are two file registers located in the ALU's internal RAM. 

Their addresses are supplied by an external circuit which can 
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Pig. 3.18 The IEU for the Input and Output Processors 
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Fig. 3.19 The Routing Processor's IEU 
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provide the hardwired address of the selected register , when 
needed. Data from the polling circuit provide the processor 
with the address of the device requesting service. 

Once the ALU inputs are processed, they leave the ALU 
via the Output Bus (OBUS) . This bus is sent to system hard- 
ware, the Address Latch, the Address Decoder and the Data Bus 
Decoder (Input Processor and Output Processor only) . The 
Address Latch holds address data stable during read and write 
operations. The Address Decoder generates device select lines. 
When used in conjunction with the Data Bus Decoder, the Address 
Decoder forms an Addressing Matrix which can activate single 
bit control lines ( 3 ] . This matrix is illustrated in 
Figure 3.21. 

3.2.3 Microprogram Word IEU and System Hardware Control 
Fields 

IEU hardware and blocks of system hardware receive 
control signals from various fields within the Microprogram 
Word (yW) . Along with control signals, the ALU can receive 
operands from the microprogram word. Control signals from 
the yW are also sent directly to systems hardware blocks. 

These signals do not require processing by the IEU. There- 
fore, while the processor performs one task, the yW control 
signals can activate components of the system hardware. This 
hardware can either assist the processor in completing its 
task or will independently perform a different task. This 
scheme permits concurrent operations to be carried on within 
the packet switch. 
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Presented in Figures 3.22, 3.23 and 3.24 are the segments 
of the yW which are required to control the IEU and the system 
hardware. In addition, these figures and Figure 3.25 contain 
the tables used to microprogram the packet switch. 

3. 2. 3.1 ALU Source Fields 

The AMD 2903 ALU chip provides the ALU with t T, o 
operand inputs labeled R and S. A 2-1 MUX supplies the R oper- 
and input with data from either the A output from the internal 
register file or the external A-Direct-Data (DA) input. Since 
no class of processors utilizes the A register file, the DA 
input is permanently selected. External to the ALU, a 2-1 mux 
selects either the yW operand data or the data held in the 
IBUS Latch, and supplies the selected source to the DA input. 
This mux is controlled by the R SOURCE field in the yW. 

The ALU's S input has three sources: The B output of the 

internal register file, the B-Direct-Data (DB) input and the 
internal Q register. Addresses for the B register file are 
supplied to the AM2903 via the external B SOURCE mux. This 
mux has che harwired addresses for each scratchpad register 
used as its inputs. The B ADDRESS field in yW controls this 
mux. Data supplied to the DB input arrives from the pro- 
cessor's polling circuit. Both the B register file output 
and the DB input are tristated. Tristate bus control is 
essential since both inputs share the same internal data bus. 
This data bus forms one of the two inputs to an internal 2-1 
mux. The other mux input is the output from the Q register. 
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Fig. 3.24 Output Processor IEU yW Control Fields 
































































Fig. 3.25 ALU Control Fields 
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Selection of the S input is made by the yW S SOURCE control 
field. This yW field controls the 2-1 mux and the tristate 
logic. 

3. 2. 3. 2 ALU Function Fields 

The selection of an ALU arithmetic function or 
logical operation i3 determined by the yW ALU Function 
field. 

3. 2. 3. 3 ALU Destination Fields 

Internally, the 2903' s ALU output is sent to both 
the register file's DATA IN input and the Q register (via the 
Q shifter). The ALU's output is also available to the Output 
Bus (ObUS) via an internal tri-state buffer. The ALU Destina- 
tion field can direct the ALU output to any or all of these 
locations. 


3. 2. 3. 4 Bus Control Fields 

^n order to hold address data stable, the OBUS is 
supplied to two address latches: The Address Latch and the 

ROM Address Latch (Routing Processor Only) . These latches are 
enabled by the Bus Latch yW field in conjunction with the 
Phase 2 clock (see 3.2.5) . 

The various yW Read and Write fields control data trans- 
fers between the processors and external hardware. 

3. 2. 3. 5 System Hardware Control Fields 

The System Hardware Control Fields consist of vari- 
ous control bits used to activate system hardware operations. 
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These signals are sent directly to the hardware since they do 
not control IEU operations. However, they usually act in con- 
junction with the processor r often helping to speed up pro- 
cessor tasks. They also may direct hardware operations which 
carry out independent tasks. Thus, use of these special con- 
trol bits has improved system throughput. 

3.2.4 The Microprogram Control Unit 

The function of the Microprogram Control Unit is 
twofold: It must control the execution of the processor's 

software and it must supply the microprogram's control signals 
to the IEU and the system hardware. A diagram of this unit 
is given in Figure 3.26. The MCU consists of an AMO 2911 
microprogram sequencer, jump control logic (implemented by a 
Programmable Logic Array (PLA) ) t 3 ] , a pipeline register 
and the microprogram memory. A block diagram of the AMD 2911 
chip is presented in Figure 3.27. This device generates the 
yprogram counter value used to control the execution sequence 
of the processor's microprogram. Next address selection pro- 
vides the MCU with one of the two possible next addresses. 
Either the yprogram counter or the address in Jump Address 
field of the yW is supplied to the address lines of the 
microprogram memory. The PLA Jump Control Logic determines 
this selection. Inputs to the PLA Jump Control Logic come 
from various system status signals and the Next Address Select 
yW field. Figure 3.28 contains the Next Address Select field 
and the Jump Address field. The Jump Control Logic Function 
for each class of processor is given in Figure 3.29. 
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Fig. 3.27 Am 2911 Microprogram Sequencer 
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The output of the AMD 2911 can be unconditionally reset 
to zero. This allows the system to initialize program execu- 
tion whenever required. Table 3.2 contains the yW widths for 
each class of processors. Since some bits in the yW remain a 
constant logic value, they can be hardwired. This reduces 
the actual Microprogram Memory (ROM) widths. 

3.2.5 Processor Timing 

A two-phase clock drives the processors. This clock 
controls the timing of internal and external data transfers. 
Figure 3.30 presents the waveforms and significant timing 
events. Phase 1 latches the internal data of the 2903 ALU 
and the 2911 microprogram sequences. Phase 2 is required to 
stabilize data in the IEU hardware that is external to the 
ALU. In addition, this clock phase is used to latch data and 
address information required for external data transfers to 
I/O ports and memory. Each clock cycle has a period of 120 
nanoseconds which yields a maximum clock frequency of 8.33 mHz 
13]. 


3.3 The System Software 

Each class of processor executes a unique software rou- 
tine. The three different routines are: The Input Service 

Routine, the Routing Service Routine and the Output Service 
Routine. A detailed explanation of each routine's function 
is presented next. 
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(l) a) Current instruction is latched into pipeline 
register. 

b) Data is clocked into Q register. 

(5) a) A and D latches internal to 2903 are open. 

b) XBUS latch is open. 

c) READ line is low during this time in a read operation. 

© a> A and B latches, 1BUS latch are closed. 

b) ALU output is stable. 

c) WE is low if storing into register file. 

@ Address is latched on this edge during an 
address, generation operation. 

( 5 ) If Write microprogram word bit is high, WRITE 
goes low during this pulse. 

Fig. 3.30 Processor Clock Waveforms 
(Courtesy of James Burnell) 
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3.3.1 The Input Service Routine 

The Input Service Routine Is executed by the Input 
Processor. Shown In Figure 3.31 Is the flowchart of this 
software routine. This routine Is sense- loop driven. The 
Input Processor loops on a status bit which is controlled by 
the Input Polling Circuit. When the poller finds a full in- 
put buffer, it updates the sense- loop status bit. Once the 
proc?. *or leaves the loop, it fetches the address of the full 
input buffer. This address is supplied by the polling cir- 
cuit. N^xt, the processor clears the buffer's DAV flag and 
restarts the poller. Restarting the poller, before service 
is complete allows the poller to find the next full buffer 
before the processor returns to the sense-loop. This scheme 
reduces processor idleness due to poller scan time. 

The Input Processor then fetches the address of a free 
data path in the Input Switching Network. This address is 
supplied by the Input Data Path Status Port. The ELIST is 
accessed next. Using this list, the Input Processor fetches 
the address of a free shift register in the array. After ob- 
taining the address stored in the location selected by the 
EPTR0 index pointer, the Input Processor increments the EPTR0. 
EPTR0 now points to the next empty shift register address 
stored in ELIST. Using the three addresses mentioned above, 
the Input Processor links the full input buffer to the empty 
shift register via the free data path. Upon completion of the 
link, the Input Processor initiates the packets transfer into 
the array. The Input Trocessor then returns to the sense loop. 
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Fig. 3.31 Input Service Routine Flowchart 





A listing of this routine is given in Figure 3.32. Since 
the instruction set of each processor is custom tailored , no 
standard computer language exists to describe the packet 
switch's software. In order to document the software , a simple 
format is used to code each line of software: 

<lst operand><operation><2nd operand > ■* <destination>. 

In the listing, instructions performing a single task are 
grouped together, followed by a comment explaining their function. 
Concurrent task execution is noted by " ; w . In addition, the v>P 
Address Code is listed next to the instruction which generates 
that particular control signal. This is done to help explain how 
the software interfaces with the system hardware. Each line of 
code listed requires 120 nanoseconds of execution time. 

3.3.2 The Routing Service Routine 

Execution of the Routing Service Routine is carried 
out by the Routing Processor. The flowchart of this software 
routine is illustrated in Figure 3.33. This routine is sense 
loop driven. The processor loops waiting for the Shift 
Register Polling Circuit to indicate that a newly arrived 
packet has been found in the array. When a new packet is 
found, the Routing Processor leaves the loop and fetches the 
packet's array address from the poller. Using this address, 
the Routing Processor fetches the packet's syndrome from the 
Syndrome Generator. This syndrome is latched into the address 
input of the Syndrome Decoder Rom. Simultaneously, the shift 
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requesting Service? 

NO: Loop 9 INPUT. 


INPUT: If IBSR-A - 0, JMP TO INPL 


Input Polling Port -*■ Q 

♦YES : Input the address 
of the buffer requesting 
service. 

[ELIST] 9EPTR0 -*• Scratch 1; Reset Poller 

♦Find a free shift regis- 
ter, clear IBSR-A and 
restart poller. 

Input Data Path Status Port Address ■* Address Latch (uPl-A) 

Data Path Busy Status Port ■* Qj Update EPTP.0 

♦Find a free data path 
and increment EPTR0. 

Scratch 2+Data Path Latch A Base Address^Address Latch (yP2-A) 

Q -*• Data Path mux select Latch A(D) 

♦Link the input buffer 
to the data path. 

Scratch 2+Data Path Latch B Base Address-»Address Latch (uP3-A) 

Scratch 1 -* Data Path Demux select Latch B(D) 

♦Link the empty shift 
register to the data 
path. 

Data Path Transmit Control Address * Address Latch 

Scratch 2 Data Bus Decoder (Ml-A) ; Jump to INPUT 

♦Start data transfer 
and return to the 
sense loop. 


Fig. 3.32 Input Service Routine 
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Pig. 3.33 Packet Routing Service Routine Flowchart 
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Fig. 3.33 Packet Routing Service Routine Flowchart, continued 






register status flag is cleared and the poller is restarted. 
The output from the Syndrome Decoder Rom is exclusive-ORed 
with the packet's header. This operation yields the corrected 
header, which is stored back into the shift register. 

Using the corrected header, the Routing Processor deter- 
mines the packet's destination. In order to route the packet, 
the Routing Processor must place the packet's array address 
into the proper Output Queue List. As stated earlier, these 
lists are shared resources which have contention problems. 
Thus, they are regulated by hardware locks which permit access 
to only one processor at a time. Therefore, before accessing 
any list, the Routing Processor must request access. In order 
to minimize the time spent accessing these resources, the 
shift register address data is first placed into the Output 
Queue List Data Port. The Routing Processor then requests 
access to the selected queue list. If access is granted, the 
data in the data port is automatically strobed into the queue 
list at the location specified by the IPTR. This scheme per- 
mits the Routing Processor to move on to the next task rather 
than writing to the list. 

Once the shift register address is placed into the proper 
queue list, the Routing Processor checks the associated OSW. 

If the OSW indicates that the corresponding buffer is not in 
the Idle state, the Routing Processor releases the Output 
Queue List. The signal generated to release the queue list 
also activates the IPTR update circuitry, which automatically 
increments the IPTR counter. After releasing the queue list, 
the Routing Processor returns to the sense loop. 
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Should the OSW Indicate that the output buffer is in the 
Idle state, the Routing Processor updates the OSW. After this 
update, the OSW indicates that the buffer is now in the Empty 
state, waiting for Output Processor service. The Routing 
Processor then releases the queue list and returns to the 
sense loop. 

If the selected Output Queue List is not available, the 
Routing Processor loops request service. This loop is called 
a SPIN LOCK since the processor spins on the hardware lock 
while waiting for the busy resource to be freed [ 5 ] . There 
exists an alternative locking scheme called the SUSPEND LOCK. 
This alternative scheme requires the processor to suspend the 
current task which needs the busy resource [ 5 ] . This task 
is temporarily put aside as the processor moves on to a new 
task. Implementation of this scheme was considered, but was 
abandoned. Several reasons led to the abandonment of the 
Suspend Lock: 

1) The additional hardware and software required to sus- 
pend and resume jobs. 

2) The next task selected may also require the busy 
resource. 

3) The time wasted idling in the spin lock is far shorter 
than the time required to suspend and resume the exe- 
cution of a job. 

4) The possibility that no new task existed, resulting 
in wasted time as the processor suspended the only 
job available. 
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Thus, the Spin Lock is used in both the Routing Service 
Routine and the Output Service Routine. A listing of the 
Routing Service Routine is contained in Figure 3.34. 

3.3.3 The Output Service Routine 

The Output Service Routine is executed by the Output 
Processor. Figure 3.35 contains the flowchart of this rou- 
tine. This routine is sense loop driven. The Output Proces- 
sor remains in the loop until the Output Buffer Polling cir- 
cuit locates an empty output buffer. When the poller finds 
an empty buffer, it notifies the Output Processor by changing 
the sense loop status bit. Once the processor leaves the 
loop, it fetches the buffer's address from the poller. Using 
this address, the Output Processor selects the corresponding 
Output Queue List. Access to the queue list is then requested. 
As in the Routing Service Routine, a spin lock is implemented 
for queue list accesses. The Output Processor must spin on 
any activated queue list lock. Once access is granted, the 
Output Processor checks to see if the selected queue list is 
empty. If the queue list is empty, the Output Processor up- 
dates the buffer's OSW to indicate that the buffer is now in 
the Idle state. Then the Output Processor releases the queue 
list, restarts the poller and returns to the sense loop. 

If the selected queue list is not empty, the Output 
Processor fetches the oldest packet address in the list. The 
associated OSW is changed to indicate that the output buffer 
is in the Busy state. After updating the OSW, the Output 
Processor releases the queue list and restarts the poller. 
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START : If NEW-B - 0, Jmp to START 


♦Is there a shift register 
requesting service? 

NO: Loop @ START. 


SRS Polling Port Scratch 1 


♦YES: Input the address of 
the shift register. 

Syndrome Generator Base Address+Scratch 1-*- Address Latch (yPl-E 
Syndrome (R) -*• Decoder ROM Address Latch: Reset Poller 

♦Fetch header Syndrome and 
send it to the Decoder ROM. 

Clear NEW-B and restart the 
poller. 

Decoder ROM Address *► Address Latch (yP2-B) 

(Decoder ROM] @Syndrom(R) -*• Q 


♦Fetch error word from ROM. 

Header Base Address+Scratch 1 -^Address Latch (yP3-A) 

ALU EXOR Q Scratch 2, Header Port(R) 

♦Correct the header. Store it 
back into the S.R. Array and 
into Scratch 2. 

Scratch AND Destination Mask Q 

♦Determine packet destination. 

Q+Output Queue List Base Address*Address Latch (yP4-B) 

♦Select the queue list and the 
OSW of the destination out- 
put buffer. 

Scratch 1 Output Queue List Data Port 

♦Place the packet's S.R. Array 
address into Queue List Data 
Port. 


Fig. 3.34 Packet Routing Service Routine 
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REQUEST : Request access to Queue List (N) 

♦Request access to the Output 
Queue List selected. If 
access is granted, the data 
in the port is automatically 
written into the queue list. 

If STATUS-B = 1, Jmp to REQUEST 

♦If access is not granted, 
loop @ REQUEST. Proceed 
otherwise. 

If OSW (N) - ROT IDLE, Jmp to END 

♦Is output buffer idle? 

Set OSW (N) “EMPTY ; Release Output Queue List; Jmp to START 

♦YES: update OSW, release 
queue list and return to 
the sense loop. 

END : Release Output Queue List; Jmp to START 

♦NO: Release queue list and 
return to the sense loop. 


Fig. 3.34 Packet Routing Service Routine, continued 
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Fig. 3.35 Output Service Routine Flowchart, continued 
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Next* the address of a free data path in the Output Switching 
Network is fetched from the Data Path Busy Port. The Output 
Processor links the shift register containing the packet 
awaiting transmission to the empty output buffer via the free 
data path. Once the data link is established, the Output 
Processor initiates the packet transfer and increments the 
BL1ST index pointer EPTR1 . EPTR1 now points to an unfilled 
location in ELIST. After this update, the address of the 
shift register containing the packet being transmitted is 
placed into ELIST at the location specified by EPTR1. The 
Output Processor then returns to the sense loop. 

A listing of this routine is given in Figure 3.36. 
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OUTPUT* If SEKVICE-C - 0, Jmp to OUTPUT 


♦Is there an output buffer 
requesting service? 

NO* Loop § OUTPUT. 

Output Foiling Port -*• Q 

♦YES: Input the address of the 
buffer requesting service. 

Q+Output Queue List Base Address+Address Latch (pPl-C) 
REQUEST t Request Queue List (N) 


♦Select the buffer's output 
Queue List and OSW. Then 
request access. 

If STATUS-C - 1, Jmp to REQUEST 

♦Was access granted? 

WO: Request access again. 

If EMPTY-C - 0, Jmp to IDLE 

♦YES: Determine if the list 
is empty. List Empty: Jump 
to IDLE. 

[Output Queue List (N) ) 0OPTR(N) -^Scratch 1; Set OSW»BUSY; 

Release Output Queue List; Reset Poller 

♦List Mot Empty: Input the 
S.R.# which contains the 
packet to be transmitted. 

Then update the OSW, restart 
the poller and release the 
queue list. 

| 

Output Data Path Status Port Address-»Address Latch (yP2-C) 

Data Path Busy Status Port -*■ Scratch 2 

♦Find a free data path. 

Scratch 2+Data Path Latch A Base Address+Address Latch (yP3-C/ 

Scratch 1 -*• Data Path MUX Select Latch A(D) 

♦Link the shift register to 
the data path. j 


Fig. 3.36 Output Service Routine 
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Scratch 2+Data Path Latch B Base Address+Addreps Latch (yP4-C) 

Q + Data Path Demux Select Latch B(D) 

♦Link the output buffer to 
the data path. 

Data Path Transmit Control Address * Address Latch 

Scratch 2 * Data Bus Decoder (Ml-C) ; Update EPTR1 

♦Start Packet transfer and 
increment EPTR1. 

Scratch 1 -*• [ELIST] @EPTRlj Jmp to OUTPUT 

♦Place S.R.# in the Empty 
S.R. List and return to the 
top of the program. 

IDLE : Set OSW*IDLE t Release Output Queue List; Reset Poller; 

Jmp to OUTPUT 

♦I'pdate OSH, release queue 
list, restart poller and re- 
turn to the top of the pro- 
gram. 


Pig. 3.36 Output Service Routine, continued 


4*0 THE MULTIPLE PROCESSOR DESIGN 

With the three processor design complete* the next logi- 
cal step in the expansion of the system is to include multiple 
processors in each processor class. The major incentive be- 
hind this idea is to increase the system throughput through 
the use of a multiprocessor architecture. However* two major 
problems must be overcome before this goal can be achieved. 

The two problems are contention and throughput-limiting func- 
tions. The solutions to these problems are presented as 
topics in this chapter since they shape the final system 
architecture. Also included in this chapter is the system 
architecture* the processors* hardware and software required 
for implementation* and the design trade-offs made. Many 
hardware components used in this design are exactly the same 
as those used in the three processor design and, therefore* 
are not presented in much detail. This chapter begins with 
an overview of the system architecture and its operation. 

4.1 The System Architecture 

The system architecture is shown in Figure 4.1. This new 
architecture is controlled by four classes of processors. The 
new class of processors and the system requirements that caused 
the additional workload division are discussed in 4.4. In 
order to examine the duties of each class of processors* a 
packet's transfer through the packet switch is traced. 

The first function of the switch is to receive and to 
store each incoming packet. When a packet arrives* it is 
temporarily stored in an input buffer. Art input buffer 
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Figure 4.1 The Multiple Processor System Architecture 





















containing a newly received packet requests processor service. 
Dedicated hardware pollers sequentially scan their assigned 
group o£ input buffers searching for full buffers. One group 
of input buffers is assigned to one Input Processor. Upon 
finding a full buffer, a polling circuit signals the Input 
Processor it is serving. Immediately, this processor estab- 
lishes a data link between the full buffer and the Shift 
Register Array. In order to set up this link, the processor 
must first find an available data path in the processor's 
dedicated Input Switching Network. Next, the processor must 
find an empty location in the Shift Register Array. Once the 
address of an empty location is fetched from the Empty Shift 
Register List (ELIST) , the processor completes the data link. 
The processor then initiates the packet's serial transfer into 
the array. As in the previous systems, this transfer is hard- 
ware monitored and terminated, allowing the processor to move 
on to a new task. 

The second function of the switch is to sort each packet 
in the array into groups of packets that are destined for the 
same group of ground stations. Each unique group of stations 
is serviced by one unique Routing Processor. Shift registers 
containing newly arrived packets signal for Packet Sorting 
Processor service. Dedicated hardware pollers scan their 
assigned group of shift registers for new packets. Once a 
polling circuit locates a new packet, the Sorting Processor 
it is serving is notified. This processor fetches the packet's 
header and corrects it. As in all previous systems, the header 
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is protected by the BCH error-correcting code. The packet's 
destination is then read from the header. Using this infor- 
mation, the Sorting Processor sends the packet's destination 
information and array address to an input/output port asso- 
ciated with the packet's destination. Each different I/O 
port belongs to one Unique Packet Routing Processor. Any 
Sorting Processor may access any I/O port. 

The Packet Routing Processors carry out the switch's 
third function, which is the updating of the Output Queue 
Lists with the addresses of sorted packets. Once an I/O port 
is found to contain valid packet routing data, the I/O port 
polling circuit signals the Routing Processor it serves. The 
Routing Processor responds by fetching the packet's destina- 
tion information. Using this information, the processor 
determines to which ground station the packet is destined. 
Packets leave for a ground station via an output buffer which 
corresponds to that ground station. Each output buffer is 
assigned to only a single ground station. In order to route 
a packet to a particular ground station, the Routing Processor 
must assign the packet to the software output queue list which 
corresponds to the proper output buffer. This assignment is 
made by fetching the packet's array address from the I/O port 
and placing it into the proper queue list. Each Routing 
Processor controls a unique group of output queue lists. A 
packet is considered routed once its array address is placed 
into one of the N queue lists. 
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The fourth and final function of the switch is to trans- 
mit the routed packets to their final destinations. This job 
belongs to the Output Processors. When an output buffer 
empties due to a completed packet transmission, the buffer 
requests processor service. Dedicated hardware pollers 
sequentially scan their own group of output buffers in search 
of empty buffers. When an empty buffer is found by a polling 
circuit, the Output Processor served by this poller is in- 
formed. The processor then accesses the output queue list 
belonging to the empty buffer. The address of the oldest 
packet waiting for transmission to this destination is fetched 
from the queue list. Next, the processor finds a free data 
path in its dedicated Output Switching Network. A link is 
established between the shift register containing the packet 
to be transferred and the empty buffer via the free data path. 
Once this link is complete, the packet transfer is initiated 
by the processor. Automatic hardware controls this serial 
packet transfer. As soon as an output buffer is loaded, the 
packet is automatically transmitted to the ground station by 
hardware external to the packet switch. While the internal 
hardware transfer takes place, the Output Processor updates 
ELI ST by placing the packet's array address into ELIST. 

If an output queue list is empty when its associated 
output buffer becomes empty the Output Processor must place 
the buffer in the "idle" state. An idle buffer will remain 
idle until a new packet arrives for that buffer. The Routing 
Processor will assign the new packet to the empty queue list. 
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Next# the Routing Processor must change the buffer's status 
to indicate that the buffer is empty and requires service from 
the Output Processor servicing that particular buffer. 

4.2 Shared Resources 

In the three processor design# contention problems be- 
tween the different classes of processors are discussed in 
depth. A workable solution is found and implemented for each 
shared resource. In this multiple processor design# new con- 
tention problems arise. Since there can be more than one 
processor in each processor class# contention may occur between 
processors of the same class. The contention problems of these 
resources can be solved with design changes within the sub- 
system they serve. These design changes may affect the archi- 
tecture of that subsystem, but they do not affect the other 
packet switch functions. Thus# the resource allocation schemes 
required by these shared resources are discussed in the sec- 
tions which describe each subsystem of the switch. 

However# there are several resources which are shared 
by two or more classes of processors. The design of these 
"Multi-Access Resources" and the formation of their alloca- 
tion schemes may affect the architecture of two or more packet 
switch subsystems. Thus# these resources must be considered 
before the entire architecture of the packet switch can be 
designed. A review of the three processor design reveals 
that there are three resources which will become Multi-Access 
Resources in the multiple processor system. These resources 
ares 
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1. The Shift Register Array 

2. The. Output Queue Lists 

3. ELIST 

Now identified, each of these resources roust be investigated 
and redesigned if necessary. 

4.2.1 The Shift Register Array 

Each Input (Output) Processor ' s switching network 
may be linked to any location in the Shift Register Array. 
However, no two Input (Output) Switching Networks will ever 
access the same location simultaneously. This is due to the 
fact that two or more Input (Output) Processors can never 
fetch the same address for a particular array location from 
ELIST (an Output Queue List) simultaneously as they service 
packets. As mentioned earlier, the array is capable of re- 
ceiving a new packet while concurrently transmitting the older 
packet from the same location. Thus, no contention problems 
will arise between the Input and Output processors even if 
they access the same location concurrently. However, unless 
only one Sorting Processor is allowed to access a single lo- 
cation at one time, contention problems will arise. These 
problems can be eliminated by the assignment of groups of 
locations to one Sorting Processor. Since packets may be 
stored in the array with an uneven distribution, the locations 
assigned to each Sorting Processor should be interleaved. 

This ensures against the Sorting Processors being forced to 
carry unproportional workloads due to uneven packet storage. 
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4.2.2 The Output Queue Lists 

In the three processor design, the Output Queue 
Lists are not completely free from contention problems. They 
are shared between the Routing Processor and the Output Pro- 
cessor. In the multiple processor system, the lists are needed 
by the multiple processors in both the Routing and the Output 
classes of processors. This requirement adds new contention 
problems for these already contention-plagued resources. In 
order to keep the amount of processor contention from increas- 
ing, a restriction regarding processor access to these lists 
must be made. Only one Routing Processor and only one Output 
Processor will be allowed to share a list. This requirement 
changes the workload of the Routing Processor used in the 
multiple processor packet switch. 

In the three processor design, the Routing Processor 
services the entire Shift Rigester Array and all of the N 
output queue lists. A packet in any Shift Register Array 
location can require routing to any output buffer. A packet 
is considered routed only after its array address is placed 
into the proper queue list. 

As described earlier, the Shift Register Array is now 
divided into groups of locations, each of which is serviced 
by a unique processor. This architecture, using the previous 
Routing Processor structure, would require that all the 
Routing Processors be allowed to access any of the N queue 
lists. Since this requirement is in conflict with the pre- 
vious design decision that limited one Routing Processor to 
a list, a new architecture is needed. 
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The new architecture will force a division of the Routing 
Processor' s. workload. This workload division requires the 
implementation of a new class of processors which is needed 
to carry out some of the tasks formally assigned to the Routing 
Processor. The new class of processor is the Sorting Pro- 
cessor. Each Sorting Processor is assigned to a group of 
shift register array locations. They are allowed to send 
routing data to any Routing Processor. Each Routing Processor 
is assigned to a unique group of Output Queue Lists. These 
two classes of processors are linked by a content ion- free 
hardware interface. Details concerning the actual implementa- 
tion of this interface and the new processors are presented 
in section 4.4. 


4.2.3 ELI ST 

The Empty Shift Register List (ELIST) is accessed 
by every Input Processor and every Output Processor as well. 
The previous ELIST structure cannot handle this requirement. 
Since only one Input Processor and only one Output Processor 
can access ELIST without interference, a new ELIST allocation 
scheme is needed to provide the multiple processors with 
contention-free access . 

The first scheme considered is the division of ELIST 
into smaller lists. Each list would then be assigned to one 
Input Processor and to one Output Processor. However, in 
order for this scheme to work properly, the workload must be 
distributed evenly among the Input Processors and also among 
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the Output Processors. An example of how an uneven packet 
distribution can cause this scheme to fail is easily illus- 
trated. 

Assume each user is transmitting packets at his maximum 
allowable rate. Assume even further that most of the packets 
sent are destined to only one or two users that are serviced 
by the same Output Processor. After a short time# all but 
one of the Input Processors will have depleted their supply 
of array addresses. Only the Input Processor that shares the 
same GLIST with the busy Output Processor will continue to 
receive new array addresses. This case illustrates the need 
to supply ELIST data to each Input Processor through the use 
of a data distribution scheme. In addition# this case 
example clearly demonstrates that ELIST must remain as a 
single resource that is shared through the use of an alloca- 
tion scheme. The idea of an ELIST data distribution system 
is the foundation on which two ELIST implementations are 
based. One design is based around an Elist Support Processor 
while the other design uses only automatic hardware. These 
two designs are discussed in detail below. 

4. 2. 3.1 Processor-Controlled ELIST 

Since there are no constraints regulating the use of 
support processors# the use of a processor to coordinate the 
operation of the ELIST data distribution system is a logical 
choice. The processor controlled ELIST system architecture 
is presented in Figure 4.2. 
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Fig. 4.2 Frocessor-Controlled ELIST Architecture 






The operation of this BLXST data distribution system is 
straightforward. Each Output Processor sends its ELX8T data 
to a dedicated I/O port. These ports support the common 
DAV/DAC handshaking protocol. A dedicated poller scans these 
ports in search of a full port. When the poller finds a full 
port (identified as full by its activated DAV flag) , it signals 
the Elist Support Processor. The support processor then 
fetches the data and sets the DAC flag. The Elist Support 
Processor then checks to see if any Input Processor-linked 
I/O port requires data. Each of these I/O ports is assigned 
to one Input Processor. Again , the DAV/DAC flag handshaking 
is used and a dedicated hardware poller is also used to scan 
these ports. If the poller had located an empty port (sig- 
nalled by an activated DAC flag) , the Elist Support Processor 
sends the ELIST data directly to the empty port. If no I/O 
port is empty, the data is stored into the ELIST RAM. if an 
Input Processor's I/O port empties before the Elist Support 
Processor has received data from an Output Processor, the 
support fetc’.es the data from the RAM and then sends it to 
the empty port. 

Since this ELIST data distribution system is controlled 
by a processor, it can serve the Input Processors and the 
Output Processors only as fast as the Elist Support Processor 
executes its task. The Elist Support Processor can support 
any packet switch throughput up to 3 Mega-packets per second 
(see Appendix) . This ELIST structure is a throughput-limiting 
function. Therefore, adding additional processors to the 


other four classes will never increase the system throughput 
beyond the upper bound of 3 Mega-packets per second. Thus, 
tn.s system is replaced by a hardware-controlled data distri- 
bution system, which is the next topic of discussion. 

Although the processor-controlled system is not used in 
this particular architecture, the processor architecture, the 
interface hardware and the software required for implementa- 
tion are located in the Appendix. This material is presented 
because the processor-controlled ELIST scheme is less complex 
than the hardware-controlled ELIST and it can offer the user 
some degree of flexibility in that the processor software can 
be custom tailored. Thus, the processor-based ELIST is the 
recommended implementation for packet switches operating below 
3 Mena-packete per second. 

4. 2. 3. 2 Hardware-Controlled ELIST 

Since hardware is relatively faster than software, 
a completely hardware-controlled ELIST will serve the packet 
switch at the fastest rate possible. This design removes the 
previous throughput limitations encountered in the ELIST 
Support processor-based design. 

KIjIST interfaces to the Input Processors and the Output 
Processors through input/output ports. A dedicated port is 
assigned to each processor accessing the list. In order to 
explain how the system services the two classes of processors 
(Input and Output) , the operation of the data storage func- 
tion is described first. 
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Figure 4.3 contains ths ELX8T Data Input Port architec- 
ture. When an Output Processor has a new array address for 
ELIST, it checks the port's Data Accepted (DAC) flag. If the 
previous data was fetched by the ELIST hardware, this flag 
is set. A set DAC flag allows the Output Processor to load 
its port with the new data. Once the data is loaded into the 
port, the processor sets the Data Available (DAV) flag. If 
the DAC flag is not set, the Output Processor must wait until 
this flag gets set. 

The ELIST storage hardware is controlled by a polling 
circuit which is driven by a counter. All the ELIST Data 
Input Port DAV flags are sent to the ELIST DAV MUX. The Out- 
put of this MUX generates the FULL PORT signal which reflects 
the status of the DAV flag selected. Selection of the DAV 
flags is controlled by the value of the counter. The value 
of this counter is also supplied to the ELIST Input Enable 
Demux. The activated Demux output enables the handshaking 
logic and the tri-stated port output of the addressed I/O 
port. 

If the addressed I/O port's DAV flag is set, the FULL 
PORT signal becomes activated. This activated signal sets 
the STORE DATA Flit-Flop. Once set, this flip-flop halts the 
poller's counter. Simultaneously, the flip-flop activates 
the one-shot that generates the active-low WRITE signal. 

While the WRITE signal is activated, the two-port ELIST RAM 
is enabled in the write mode. The data from the enabled I/O 
port is then strobed into the RAM. 
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t>m/J Receive 












The ELIST RAM Structure is shown in Figure 4.4. The 
ELIST Data Structure is presented in Figure i.5. In this 
data structure, both the read and write operations are per- 
formed before the index pointers are updated. Both pointers 
are updated by being incremented. Once the bottom of the 
list is encountered, they roll over and return to the top of 
the list. 

Once the write operation is complete, the low-active 
WRITE signal goes high. The leading edge of this low-to-high 
transition fires the one shot which activates the DATA 
RECEIVED signal. This activated signal clears the DAV flag 
and sets the DAC flag belonging to the enabled I/O port. The 
clearing of the DAV flag clears the STORE DATA flip-flop. The 
reset flip-flop activates the UPDATE0 signal which increments 
the counter which serves as the write index pointer (EPTR0) . 

In addition, the reset flip-flop enables the poller to re- 
start. Figure 4.6 contains the timing diagram and the signifi- 
cant events for this entire operation. 

The second function of the ELIST data distribution system 
is to supply each Input Processor with the address data stored 
in the ELIS" when required. Figure 4.7 contains the ELIST Data 
Output Port system which carries out this task. The primary 
function of this system is to keep each ELIST Data Output Port 
filled with valid data. If the ports are kept full, no Input 
Processor will be forced to wait for data. 

As with the ELIST Data Input Ports, each Data Output Port 
is assigned to one processor. Each port has its own hand- 
shaking flags. When an Input Processor needs data from the 
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ELIST RAM Structure 
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ELIST Input Port Hardware Timing Diagram 













Fig. 4.7 ELIST Data Output Port 












ELIST, it checks the I/O port’s DAV flag. If this flag is 
set, valid data is held in the port and the Input Processor 
fetches this data immediately. If the DAV flag is not set, 
the processor must wait for service. 

Once data is fetched from an I/O port, the Input Pro- 
cessor accessing this port sets the DAC flag. Every I/O 
port's DAC flag is sent to the ELIST DAC MUX. The output 
of this MUX generates the EMPTY PORT signal. Selection of 
the DAC flag is controlled by the counter which drives the 
mux. The value of this counter is also sent to the LOAD 
Demux and to the DATA SENT DEMUX. The activated Demux out- 
puts enable the handshaking logic and the data loading cir- 
cuitry of the addressed I/O port. 

If the selected I/O port's DAC flag is se* f the EMPTY 
PORT signal is activated. This signal then sets the SEND DATA 
flip-flop. Once set, this flip-flop halts the poller and 
activates the one-shot that produces the active-low LOAD sig- 
nal. While the LOAD signal is active, the data on the ELIST 
Output Data Bus is strobed into the enabled I/O port. The 
data on the ELIST Output Data Bus is supplied from the RAM 
location selected by the read index pointer. 

Once the data transfer has been finished, the active- low 
LOAD signal goes high. The leading edge of this low-to-high 
transition activates the one shot that generates the DATA SENT 
signal. The DATA SENT signal then clears the enabled port's 
DAC flag and sets its DAV flag. The reset DAC flag clears 
the SEND DATA flip-flop, enabling the poller to restart its 
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scanning. Simultaneously, the DATA SENT signal goes low 
resulting in the updating of the read index pointer . This 
update increments the hardware counter which serves as the 
read index counter. A timing diagram of the complete ELI ST 
output function, along with the significant timing events, 
is presented in Figure 4.8. 

4.3 The Input System 

The Input System consists of the Input Buffers, the 
Input Processors, the Input Switching Networks and the Input 
Polling Circuits. This sytem interfaces to the Shift Register 
Array and the ELIST Data Distribution System. The architec- 
tural organization of this system is presented below. 

However, before the architecture can be designed and 
explained, the contention problem related to the Input 
Switching Network must be solved. As in the previous designs, 
the Input Switching Network provides programmable data paths 
from the input buffers to the Shift Register Array. In order 
to provide the address of an available data path in the net- 
work, the status of each path is monitored by the hardware in 
the Data Path Busy Port. This port is accessed by the Input 
Processor. 

If a single Input Switching Network is used in the 
multiple processor system, access to the status port must be 
granted to only one Input Processor at a time. Since several 
Input Processors will require access to this resource, a 
resource allocation scheduling scheme is needed. This scheme 




Fig. 4.8 ELIST Output Port Hardware Timing Diagram 




will require new hardware and additional software. The addi- 
tional software will reduce the throughput of the Input Pro- 
cessors. Throughput may be reduced even further if the Input 
Processors are forced to wait for the resource whenever it is 
busy. This contention problem needed to be solved. The solu- 
tion implemented in this design eliminates contention com- 
pletely by allocating a dedicated Input Switching Network to 
each input processor. 

4.3.1 Architectural Workload Division 

In the three processor design, the workload is 
divided into three relatively independent tasks. This scheme 
works quite well, in that each processor can carry out its 
assigned task without interference from the other processors. 
However, in the multiple processor design, the workload of 
the packet switch must be sub-divided within the three func- 
tions. Processors in the same class must share the workload 
within the function assigned to that processor class. There- 
fore, if the proper architecture is not implemented, a pro- 
cessor may be faced with interference from the other processors 
in its own class. 

The processors controlling the Input System can be 
organized using one of two techniques: Master/Slave Scheduling 

or Separate Systems [ 5 J . The Master/Slave Scheduling scheme 
is organized such that one processor maintains the status of 
all the "Slave Processors" and the uncompleted tasks. This 
"Master Processor" schedules the work for each of the Slave 
Processors . 
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The Separate Systems scheme is organized such that each 
processor carries out its assigned tasks in parallel with the 
other processors. The assignment of tasks for each processor 
is fixed by the system architecture. There is no dynamic 
allocation of processors to tasks, as in the Master /Slave 
Scheduling System. In addition, each processor is assigned 
dedicated memory and dedicated I/O devices. 

These two schemes are the foundation on which two archi- 
tectures for the input system are based. Each of the two 
architectures are presented below. Also included are the 
design considerations which led to the selection of the 
Separate System scheme. 

4. 3. 1.1 Master /Slave Scheduling 
One possible implementation of the Input System 
using the Master/Slave Scheduling scheme is presented in 
Figure 4.9. This figure contains a block diagram of the 
Input System Architecture A. 

Input System Architecture A uses a hardware poller to 
locate full input buffers. Once the poller finds a full buf- 
fer, which is indicated via an Input Status Word (ISW) , it 
stops and signals the Job Scheduling Processor (Master Pro- 
cessor) . The Job Scheduling Processor inputs the address of 
the full buffer from the poller. The Job Scheduling Processor 
then updates the ISW to indicate "partial service" and re- 
starts the poller. Next, the Job Scheduling Processor fetches 
the address generated by the priority encoder. This encoder 
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Fig. 4.9 Input System Architecture "A 





is driven by the Busy flip-flops that indicate the current 
status of each Slave Processor. The encoder supplies the 
address of the free Slave Processor which has the highest 
assigned priority. Using this address , the Job Scheduling 
Processor assigns the task of servicing the full buffer to 
the free Slave Processor. This Slave Processor sets its Busy 
flip-flop and begins the task of inputting the packet to the 
Shift Register Array. 

The main advantage of this scheme is that the workload 
is shared by all the available Slave Processors , regardless 
of the distribution of the incoming packets. Since the Slave 
Processors are assigned to tasks (incoming packets) and not 
to the input buffers , all the processors will be utilized 
even if only one or two channels are heavily loaded. An addi- 
tional advantage of this system is that under lightly loaded 
conditions, the low priority Slave Processors will be free. 
These low priority processors could be programmed to execute 
background functions. Service for the input buffers could 
then be interrupt driven. 

There exist two disadvantages in Architecture A. The 
first disadvantage is a reliability problem. If the Job 
Scheduling Processor fails, the entire packet switch becomes 
inoperative* One possible solution to this problem is the 
implementation of additional Job Scheduling Processors that 
are assigned their own dedicated Slave Processors. Another 
possible solution is to have a Slave Processor replace the 
Job Scheduling Processor in the event of a failure. Both of 
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these schemes will add complexity to the hardware and/or 
software* 

The second disadvantage is the amount of hardware re- 
quired to allow any Slave Processor to serve any input buffer. 
This architecture could require thousands of control lines 
for just one control signal. An example of this problem is 
given below for a typical systems 

N ■ 100 Users (100 input buffers required) 

# of Slave Processors - 4 processors 

# of Input Switching Network Data Paths 

* 10 paths/processor 

# of DATA IN lines “ 1 line/user/data path 

(100 users)* (4 processors) • (10 paths/processor) *1 line/user/ 
data path ** 4000 DATA IN lines. 

As a result of this finding, a new multiprocessor archi- 
tecture for the Input System is proposed. The new architecture 
is discussed below. 

4. 3. 1.2 Separate Systems 

The Input System Architecture is presented in 
Figure 4.10. Each Input Processor controls a complete input 
system. Each of these systems operate independently of one 
another. The size of these systems is determined by the r um- 
ber of input buffers assigned to each system. Once a group 
of buffers are assigned to an Input Processor, they remain 
fixed to that processor. Therefore, in this scheme. 
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scheme to the complexity of the Master/Slave Scheduling scheme , 
the previous example using the DATA IN lines will be continued: 


N = 100 users (100 input buffers required) 

# of Input Processors 8 4 processors 

# of Input Switching Network Data Paths 
8 10 paths/processor 

# of DATA IN lines 8 1 line/1 user/1 data path 

(25 users/separate system/processor) • (4 separate systems) • (10 
data paths/processor) • (1 DATA IN line/1 user/1 data path) 8 
1000 DATA IN lines 

The Master/Slave Scheduling scheme required 4000 DATA IN 
lines. Since the DATA IN line is only one of two Input 
Switching Network signals that requires 1 line per 1 user per 
1 data path, the Separate Systems scheme is clearly less com- 
plex. 

The one major drawback with this architecture is that 
idle or lightly loaded processors cannot be assigned to 
heavily loaded channels if those channels are under the con- 
trol of another processor. Thus, some Input Processors may 
become heavily loaded while the other Input Processors remain 
idle or under-utilized. However, this architecture is con- 
sidered to be the best compromise since it is not as complex 
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as Architecture A. Therefore* this is the architecture that 
is chosen for the actual implementation. 

Since this architecture merely divides the workload by 
means of buffer assignments* no major hardware changes are 
required. The Input Buffers* the Input Switching Networks 
and the polling circuits are identical to those used in the 
three processor design. Therefore* these system components 
are not presented in this chapter (See 3.1 for a review of 
these components) . 

4.3.2 The Input Processors 

The Instruction Execution Units and the Microprogram 
Control Units for the Input Processors are the same as those 
used in the Three Processor Designs (see section 3.2 for a 
review) . However, since ELIST has been redesigned to meet new 
requirements* the Input Processors' Microprogram word is dif- 
ferent. Figure 4.11 contains the IEU Control Fields in the 
microprogram word. Figure 4.12 contains the MCU Control 
Fields and the Jump Control logic function for the Input 
Processor. Again* their functions are similar to those in 
the three processor design as discussed in 3.2.4. 

4.3.3 The Input Software Routine 

The Input Service Routine is sense loop driven. The 
Input Processor remains in the loop until the Input Polling 
Circuit locates a full input buffer. Once a full buffer is 
found* the processor leaves the loop and fetches the address 
of the buffer from the poller. Next* the Input Processor 
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Fig. 4.11 Input Processor IEU Microprogram Control Fields 
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fetches the address of a free data path in its dedicated Input 
Switching Network. Simultaneously , the processor clears the 
buffer's service request flag and restarts the poller. The 
Input Processor then checks the DAV flag at its ELI ST Data 
Port. If this flag is not set, the processor loops until the 
flag becomes set. When the flag is set, the Input Processor 
fetches the address of an empty shift register from the port 
and sets the DAC flag. 

Using these three addresses, the Input Processor links 
the full input buffer to the empty shift register via the 
free data path. Once this link is established, the processor 
initiates the data transfer and returns to the loop. A flow 
chart of this software routine is presented in Figure 4.13. 

A listing of this program is given in Figure 4.14. 

4.4 The Routing System 

The Routing System consists of the Shift Register Array, 
the Sorting Processors, the Shift Register Array polling cir- 
cuit, the Routing Processors, and the Packet Routing Data I/O 
ports. The Routing System interfaces to the Input System, 
the Output Queues Lists and the Output System. The archi- 
tectural organization of this system is presented below. 

4.4.1 Architectural Workload Division 

As discussed in section 4.2, the Routing function 
as defined in the three processor architecture can no longer 
meet the requirements of the multiple processor design. The 
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Pig. 4.13 Input Service Routine Flowchart, continued 



Input ! If IBSR-A=0, JMP to INPUT 

♦Is there an input buffer 
requesting service? 

NO: Loop @ INPUT. 

Input Polling Port *» Q 

♦YES: Input the buffer's 

address . 

Input Data Path Status Port Address-»Address Latch ( jipl-A) ; 

Reset Poller Data Path Busy Status Port-*Scratch 1 

♦Find a Free data path, 
clear IBSR-A and restart 
the poller . 

ELIST Data Port Address-*Address Latch (np2-A) 

WAIT ? If ELIST DAV = 0, JMP to WAIT 

ELIST Data Port-*Scratch 2; Sent a DAC 

♦When the data becomes 
available, input the shift 
register number from the 
ELIST port. 

Scratch 1+Data Path Latch A Base Address-^ Address Latch (yP3-A) 

Q . Data Path MUX select Latch A(D) 

♦Link the buffer to the 
data path. 

Scratch 1+Data Path Latch B Base Address^Address Latch (yP4-A) 

Scratch 2 Data Path DeMUX select Latch B(D) 

♦Link to empty shift 
register to the data path. 

Data Transmit Control Address-*Address Latch 

Scratch l-*Data Bus Decoder (Ml-A) ; JMP to INPUT 

♦Start data transfer and 
return to sense loop. 


Figure 4 . 14 Input Service Routine 
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principle requirement is that each Output Queue List be 
accessed by only one Routing Processor. This constraint is 
satisfied by dividing the Routing function into two smaller 
tasks. Each task, the sorting of packets and the routing of 
packets, is assigned to one class of processors. The Packet 
Sorting Processor is assigned the task of sorting each packet 
in the array. The sorting function requires that a packet's 
destination' and its location in the array be sent to the 
proper Packet Routing Processor. The Packet Routing Processor 
then uses this information to route the- packet by placing the 
packet's array address into the proper output queue list. 

The system architecture for a single Packet Sorting Processor 
is presented in Figure 4.15. Figure 4.16 illustrates the 
system architecture for a single Routing Processor. 

Implementation of this scheme does not require the re- 
design of the Shift Register Array, the Shift Register Array 
Polling Polling Circuit, or the Output Queue Lists. There- 
fore, these components are not discussed in this chapter (see 
section 3.1 for a review of this hardware) . However, the 
processors and their software routines are different from 
those in the previous design. In addition, the new component, 
the Packet Routing Data Port, is implemented in this archi- 
tecture. Therefore, these topics are discussed. The Packet 
Routing Data Ports are presented below as the first topic. 

4.4.2 Packet Routing Data Ports 

The Packet Routing Data Ports provide the necessary 
interface between the Packet Sorting Processor and the Packet 
Routing Processor. Each Packet Routing Processor is assigned 
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its own Packet Routing Data Port. Any Packet Sorting Pro- 
cessor can send data to any Packet Routing Data Port. Asso- 
ciated with every Packet Routing Data Port is a dedicated RAM 
which is external to the Routing Processor it serves. The 
function of these ports is to accept the routing information 
from the Packet Sorting Processors , to store the routing infor- 
mation in the external RAM and to provide this data to the 
Packet Routing Processor when needed. 

There exists an alternative to using the external RAM 
for storage, but it is considered too costly to implement. 

The alternate scheme requires that whenever a Sorting Processor 
places data into a Routing Processor's data port, the Routing 
Processor is to be notified by an interrupt. This interrupt 
signal is activated by the Sorting Processor. The Routing 
Processor responds to the interrupt by suspending the Packet 
Routing Routine in order to fetch the data from the full port. 
Once fetched, this data is stored in an internal software queue. 
This scheme is considered too costly because: 

1) Additional processor hardware will be required to 
handle the interrupts. 

2) The additional required software overhead will 
increase execution times and reduce throughput. 

These are the reasons the hardware stack scheme is implemented. 

The operation of a Packet Routing Data Port can be best 
explained by tracing the procedure that a Packet Sorting Pro- 
cessor follows to send data to a port. Once a Packet Sorting 
Processor determines which port is to receive the data, it 
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checks the associated DAC flag. If this flag is not set, the 
processor waits for it to be set. When the flag is set, the 
Packet Sorting Processor sends the packet's destination infor- 
mation to the Packet Destination Data Latch. Next, the 
Packet Sorting Processor sends the packet's shift register 
array address to the Packet Array Address Data Latch. Once 
both latches are loaded, the Packet Sorting Processor sets 
the DAV flag which automatically clears the DAC flag. A 
single Packet Routing Data Port is illustrated in Figure 4.17. 

Every DAV flag is scanned by a hardware polling circuit. 
This polling circuit is presented in Figure 4.18. When an 
activated DAV flag is found by the poller, the STORE DATA 
flip-flop is set. The set flip-flop halts the poller and 
activates the one-shot that generates the low-active WRITE 
signal. The outputs of the two data latches associated with 
active DAV flags are enabled. The enabled output of the 
Packet Destination Data Latch is sent to the Packet Destina- 
tion Data RAM and the output of the Packet Array Address Data 
Latch is sent to the Packet Array Address Data RAM. Both of 
these RAM's are enabled in the write mode by the activated 
WRITE signal. The WRITE signal is held activated until the 
data is strobed into the RAM's. This data is stored into 
the two RAM's at locations which have the same address since 
both RAM's share a single index pointer. The architecture 
of the Packet Routing Data RAM's is presented in Figure 4.19. 

Once the write operation is complete, the active -low 
WRITE signal goes high. The low-to-high transition of this 
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Fig. 4.18 Packet Routing Data Port Polling Circuit 
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19 Packet Routing Data 





signal activates the one-shot that generates the Flag UPDATE 
signal. The Flag UPDATE signal clears the DAV flag and sets 
the DAC flag. The clearing of the DAV flag resets the STORE 
DATA flip-flop. The reset flip-flop activates the INDEX UP- 
DATE signal which increments the hardware counter that serves 
as an index pointer. The data structure for the Packet 
Routing Data List is given in Figure 4.20. Both the read and 
the write operations take place before the index pointers are 
incremented. When the two index pointers are equal, the list 
is assumed to be empty. 

When the Packet Routing Processor needs to fetch data 
from the RAM's, it first selects the Packet Destination Data 
RAM and then fetches the data. Next, the Packet Routing 
Processor selects the Packet Array Address RAM and fetches 
the data. The processor increments the index pointer once 
both read operations are finished. Two-port RAM's are used 
to allow a simultaneous read by the processor and write by 
the hardware. Since the list is assumed to be empty when the 
index pointers are equal, a read and a write operation will 
never occur at the same location. 

4.4.3 The Packet Sorting Processors 

The Instruction Execution Units and the Microprogram 
Control Units of the Packet Sorting Processors are similar to 
those used in the three processor design for the Routing Pro- 
cessor (see section 3.2). The IEU control fields in the Micro- 
program Word for the Sorting Processors are presented in 
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Figure 4.21. Figure 4.22 contains the MCU control fields and 
the Jump Control Logic Function for the Packet Sorting Pro- 
cessors. 

4.4.4 The Packet Sorting Service Routine 

The Packet Sorting Service Routine is sense-loop 
driven. While the Shift Register Array Polling Circuit 
searches for urser viced packets, the Packet Sorting Processor 
loops on the test bit. Once an unserviced packet is found by 
the polling circuit, the Packet Sorting Processor exits from 
the loop. The processor fetches the address of the packet 
from the halted poller. The packet's syndrome is fetched 
and sent as an address to the Syndrome Decoder ROM. Con- 
currently, the packet's service request flag is cleared and 
the poller is restarted. The ROM output is fetched and 
exclusively-ored with the fetched packet header. The cor- 
rected header is stored back into the array. Using the cor- 
rected header information, the Packet Sorting Processor 
determines the packet's destination. The destination informa- 
tion is then used to determine which Packet Routing Processor 
is to receive the packet's routing data. This is accomplished 
by sending the destination data to the Sorting Processor's 
Address Decoder. This decoder will generate the address of 
the Packet Routing Data Port associated with the destination 
of the sorted packet. Since a Routing Processor may route 
packets destined for different ground stations, the different 
destination codes of these packets must generate the address 
of this Routing Processor's port. Since the different codes 
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Pig. 4.21 Packet Sorting Processor IEU Microprogram Control Fields 
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will enable different address decoder lines, an encoding 
scheme is needed. WIRE-ANDing the different address lines 
that must enable the same port will provide the system with 
the single port address lines needed. The only constraint 
associated with this scheme is the requirement that the address 
decoders have low active, open collector outputs. 

Once the proper Packet Routing Data Port is addressed, 
the Sorting Processor checks to determine if the port is 
empty. If the port still contains valid data, the Packet 
Sorting Processor waits for the port to be emptied by the 
automatic port hardware. When the port is empty, the Packet 
Sorting Processor first sends the packet's destination data 
to the port. The processor then sends the packet's array 
address, sets the port's DAV flag and returns to the sense 
loop. Figure 4.23 contains the flow chart for this routine. 

A listing of this program is supplied in Figure 4.24. 

4.4.5 The Packet Routing Processors 

The Microprogram Control Units of the Packet Routing 
Processors are similar to one used in the three processor 
design for the Routing Processor (see section 3.2). However, 
the Instruction Execution Units (IEU) of the Packet Routing 
Processors are redesigned to handle the Packet Routing Data 
Ports. Since the Packet Routing Processors need no polling 
circuits, the Direct Data (DB) is used to supply the pro- 
cessors with the Packet Routing Data. This scheme saves 
execution cycles since the processors are not required to 
generate the addresses of the external data RAM's. A single 


144 


o 



I 



Fig. 4.23 Packet Sorting Service Routine Flowchart 





Pig. 4.23 Packet Sorting Service Routine Flowchart, continued 
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START* Z£ NEW-D-0, JMP to START 


♦Is there a shift register 
requiring service? 

NO: Loop 9 START. 

8RS Polling Port-*Scratch 1 

♦YES* Input the address 

of the shift register. 

Syndrome Generator Base Address* Scratch 1 Address-*-Latch (jiPl-D) 
Syndrome (R) -*■ Decoder ROM Address Latch* Reset Poller 


♦Fetch header syndrome and 
send it to the Decoder ROM. 
clear MEW-0 and restart 
the poller. 

Decoder ROM Address-»Address Latch (|tp2-D) 

(Decoder ROM] 9 Syndrome (R)-»Q 

♦Fetch error word from ROM. 

Header Base Address * Scratch l**Address Latch (|ip3-D) 

ALU EXOR Q*»Scratch 2, Header Port (R) 

♦Correct the header. Store 
it into the S.R. Array and 
into Scratch 2. 

Scratch 2 AND Destination Mask-*Q 

♦Determine packet destina- 
tion 

Q * Packet Routing Processor Base Address*»Address Latch 
(*ip4-D) 

LOOP ? If DAC-D « 0, JMP to LOOP 

Q-*selected Packet Routing Destination Data Port 
Scratch l-*selected Packet Routing Shift Register # Data 
Port; set DAV flag; JMP to START. 

♦Select the proper Packet 
Routing Processor ' s Data 
Port. Send the packet's 
destination data. Then 
send the packet's S.R. 
array address. Set the port's 
DAV flag and return to the 
top of the program. 

Figure 4.24 Packet Sorting Service Routine 
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control signal from the processor's microprogram word controls 
the DB SOURCE MUX, which selects either the Packet Destination 
Data RAM or the Packet Array Address Data RAM. The output 
from the MUX is tristated because the DB data bus is inter- 
nally shared by the output of the ALU's register file. The 
redesigned IEU used by the Packet Routing Processors is dis- 
played in Figure 4.25. The IEU control fields in the Micro- 
program Word for the Packet Routing processors are given in 
Figure 4.26. The MCU control fields in the Microprogram Word 
and the Jump Control Logic Function are presented in 
Figure 4.27. 

4.4.6 The Packet Routing Service Routine 

The Packet Routing Service Routine is sense-loop 
driven. The Packet Routing Processor loops, testing the 
status bit which informs the processor when packet routing 
data is available. When a packet's routing data is available, 
the Packet Routing Processor leaves the loop. The processor 
then fetches the packet's destination information. Next, the 
packet's array address is fetched. Using the destination data, 
the Packet Routing Processor selects the proper output queue 
list. Concurrently, the Processor's index pointer for the 
Packet Routing Data List is incremented. The packet's array 
address is loaded into the Queue List Data Port by the pro- 
cessor. The Processor then requests access to the queue list. 
Requests for access are generated until the Packet Routing 
Processor is allowed to access the queue list. Once access 
is granted, the hardware automatically strobes the array 
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Fig. 4.27 Packet Routing Processor MCU Control Fields 
and Jump Control Logic Function 
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address data from the port into the queue list RAM. Meanwhile, 
the Packet Routing Processor checks the status of the queue 
list's corresponding output .buffer. If the buffer is in the 
Idle state, the processor updates the buffer's status to the 
Empty state, releases the queue list and returns to the sense 
loop. However, if the buffer is not in the Idle state, the 
processor simply releases the queue list and returns to the 
loop. The flow chart for this software routine is shown in 
Figure 4.28. A listing of this program is given in Figure 4.29. 

4.5 The Output System 

The Output System consists of the Output Buffers, the 
Output Processors, the Output Switching Networks, and the Out- 
put Polling Circuits. Interfacing to this system are the 
Output Queue Lists, the Shift Register array and the ELIST 
Data Distribution System. The architectural organization of 
this system is presented below. 

4.5.1 Architectural Workload Division 

The two major system constraints that influence the 
architectural organization of this sytem are: 

1) Only one Output Processor must control an output 
buffer. Each output buffer must be assigned to only 
one Output Processor in order to eliminate resource 
contention. 

2) Only one Output Processor can have access to an out- 
put queue list. 



Fig. 4.28 Packet Routing Service Routine Flowchart 
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TEST* If ROUTE-B » 1, JMP to TEST 


♦Are there any packets re- 
questing routing? 

NO* Loop @ TEST 

Destination Data BAM Scratch 1 
Shift Register Address RAM •+ Scratch 2 

*YESt Input the packet's des- 
tination and array address. 

Scratch 1+ Out put Queue List Base Address*+Address Latch ()jP 1-B) ; 
update packet data pointer 


♦Select the Output Queue List 
and OSW of the destination 
buffer 

Scratch 2 -*• Output Queue List Data Port (N) 

♦Send the packet's array address 
to the Output Queue List Data 
Port. 


REQUEST ; Request Queue List (N) 

♦Request access to the Output 
Queue List selected. If access 
is granted, the data from the 
Port is automatically stored. 

If STATUS -B = 1, JMP to REQUEST 

♦If access is not granted. Loop 
@ REQUEST. Proceed otherwise. 


If OSW = NOT IDLE, JMP to END 

♦Is the output buffer idle? 

Set OSW=EMPTY; Release Output Queue List (N) ; JMP to TEST 

♦YES: Update OSW, release Queue 
List and return to the top of 
the routine. 

END : Release Output Queue List (N) ; JMP to TEST 

♦NO: Release queue list and re- 
turn to the top of the routine. 


Fig. 4.29 Packet Routing Service Routine 
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The Separate Systems Scheme as discussed in section 4.3.1 is 
considered the best technique to use in organizing the Output 
System in order to fulfill the above requirements. The system 
architecture of a single Output Processor in the Separate 
System Scheme appears in Figure 4.30. Each output processor 
is assigned to a fixed number of output buffers. In addi- 
tion, each processor is assigned a dedicated Output Switching 
Network, a dedicated Output Polling Circuit and is allowed 
access to the Output Queue Lists that corresponded to the 
assigned output buffers. Since the implementation of this 
architecture did not require the redesigning of the Output 
Buffers, the Output Polling Circuits or the Output Switching 
Networks, these hardware blocks are not discussed in detail 
in this chapter (see section 3.1 for a review). The Output 
Processors and their software are discussed below. 

4.5.2 The Output Processors 

Both the Instruction Execution Units and the Micro- 
program Control Units used by the Output Processors are 
similar to those used by the Output Processors in the three 
processor architecture (see section 3.2) . Shown in Figure 
4.31 are the IEU control fields of the Output Processors' 
Microprograms Word. Figure 4.32 displays the MCU control 
fields of the Microprogram Word and also the Jump Control 
Logic Function for this class of processor. 

4.5.3 The Output Service Routine 

Like all the software routines, the Output Service 
Routine is also sense- loop driven. The Output Processor 


156 



System Architecture for a Single Output Processor 
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Pig. 4.32 Output Processor MCU Control Fields 
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leaves the loop once its polling circuit locates an empty out- 
put buffer. After leaving the loop, the processor fetches the 
address of the buffer from the halted poller. Using this 
information, the Output Processor selects the buffer's corre- 
sponding output queue list. A request for access to this 
list is generated by the processor until access is granted. 
When access is granted, the Output Processor determines if 
the queue list is empty. If the processor finds the list 
empty, the processor changes the buffer's status from the 
Empty state to the Idle state. Concurrently, the processor 
releases the queue list, restarts the poller and returns to 
the loop. 

However, if the queue list accessed is not empty, the 
Output Processor fetches the address of the packet to be 
transmitted. Simultaneously, the output buffer's status is 
changed from the Empty state to the Bus ,f state, the queue 
list is released and the poller is restarted. After the Out- 
put Processor has completed all these tasks, it finds a free 
data path in its dedicated Output Switching Network. This 
data path is linked to the shift register containing the 
packet to be transmitted. The Output Processor then links 
the empty buffer to the data path. Once the path is complete, 
the processor initiates the packet's transfer into the output 
buffer. While this transfer is taking place, the Output 
Processor checks the status of its ELIST Data Distribution 
I/O port. If the Data Accepted (DAC) flag is not set, the 
processor loops until it becomes set. Once the Output 


Processor finds the flag set, it sends the array address of 
the freed shift register. After loading the I/O port, the 
processor sets the port's DAV flag and returns to the sense 
loop. Figure 4.33 contains the flow chart for this routine 
and the listing of this program appears in Figure 4.34. 
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Fig. 4.33 Output Service Routine Flowchart, continued 













OUTPUT: If SERVICE-C - 0, JMP to OUTPUT 



*Is there an output buffer 
requesting service? 

NOt Loop £ OUTPUT 


Output Polling Port**Q 


♦YES: Input the address of 

the buffer. 

Q + Queue List Base Address**Address Latch (upl-C) 

REQUEST : Request Output Queue List (N) 

♦Select the buffer's Output 
Queue List and OSW. Then 
request access. 

If STATUS-Ol, JMP to REQUEST 

♦Was access granted? 

NO: Request access again. 


If EMPTY -O0, JMP to IDLE 


♦YES: Determine if the list 

is empty. 

List Empty: Branch to IDLE 

[Output Queue List (N) ] @ OPTR (N) ^Scratch 1; Set 
OSW»BUSY; Release Output Queue List; Reset Poller 

♦LIST NOT EMPTY: Input the 

S.R.# which contains the 
packet to be transmitted. 

Then update the OSH, restart 
the poller and release the 
queue list. 

Output Path Status Port Address-»Address Latch (;ip2-C) 

Data Path Busy Status Port-*Scratch 2 

♦Find a free data path. 

Scratch 2+Da ta Path Latch A Base Address-*Address Latch (pp3-C 
Scratch 1 ■* Data Path MUX select Latch A(D) 

♦Link the shift register to 
the data path. 


Fig. 4.34 Output Service Routine 
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LOOP: 


IDLE: 


Scratch 2+Data Path Latch B Base Address-Address Latch (up4-C 
Q -*■ Data Path DeMUX select Latch B(D) 

♦Link the output buffer to 
the data path. 

Data Path Transmit Control Base Address-»Acldress Latch 
Scratch 2-*Data Bus Decoder (Ml-C) 

♦START Packet transfer. 

V3LIST Data Port Address*»Address Latch (y.p5-C) 

If DAC-C-0 , JMP to LOOP 
Scratch 1-*ELIST Data Port 

♦Send the empty S.R.# to the 
ELIST data port when the 
port is empty. 

Send a DAV; JMP to OUTPUT 


♦Send a DAV to the port and 
return to the top of the 
program. 

Set OSW=IDLE; Release Output Queue List; Reset poller; 
JMP to OUTPUT 


♦Update OSW, release queue 
list, restart poller and 
return to the top of the 
program. 


Fig. 4.34 Output Service Routine, continued 


5.0 EVALUATION AND THROUGHPUT ANALYSIS 

The evaluations of the two packet switch architectures 
are presented in this chapter. The evaluation of the packet 
switch's performance is in terms of throughput. This evalua- 
tion is based on the software execution times. In the multiple 
processor architecture, additional parameters affect the sys- 
tem throughput. Therefore, equations relating the number of 
processors and the number of users to the system throughput 
are presented. 

5.1 Performance Evaluation 

In order to compute the maximum system throughput, two 
assumptions must be made. Both assumptions hold true for the 
two architectures. The first assumption is that the system 
is heavily loaded such that all output queues contain at least 
one packet awaiting transmission. The second assumption arises 
from the fact that processors never wait for internal hardware 
and that each system is virtually free from resource conten- 
tion. Thus, each processor is assumed to be busy 100% of the 
time under heavily loaded conditions. Therefore, a processor 
can process one packet in the amount of time required to exe- 
cute the assigned software routine completely without inter- 
ruption. Using these assumptions, an estimation of throughput 
for each multiprocessor architecture is presented below. 

5.1.1 Throughput Estimation for the Three Processor System 
In order to estimate the system throughput, equations 
and relationships are developed. In these calculations, system 
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parameters are introduced. These parameters are: 

1) t p1 * Input Service Routine execution time 

2) tp 2 * Routing Service Routine execution time 

3) t p3 * Output Service Routine execution time 

4) R = Bit Rate per user 

5) N - Number of Users 

6) B * Number of Bits per Packet 

7) Fp = System Throughput in Packets per Second 

8) F fi = System Throughput in Bits per Second 

A processor can process one packet in the amount of time 
required to execute the assigned software routine. Since 
each packet must be serviced by all three routines, the pro- 
cessor with the longest execution time will determine the 
maximum system throughput. The software execution time for 
each processor is listed in Table 5.1. A processor clock 
cycle of 120 nanoseconds is assumed. Table 5.1 shows the 
number of instruction cycles required and the time taken. 

Some routines have several execution times listed. Each of 
the different values illustrate the various effects of re- 
source contention, the state of the output queue lists and 
the state of the output buffers. 
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Normal Operation (No memory 


contention) : 

Input . Service Routine 11 

Output Service Routine 

(a) Transmit Packet 16 

(b) Empty Queue 7 

Packet Routing Service Routine 

(a) Enqueue Packet 15 

(b) Enqueue Packet 

and Update OSW 15 

Worst Case Due to Memory Contention: 

Input Service Routine 11 

Output Service Routine 

(a) Empty Queue 0 

(b) Transmit Packet 18 

Packet Routing Service Routine 

(a) Enqueue Packet 

(Default) 19 

(b) Enqueue Packet 

and Update OSW 
(Default) 19 

(c) Enqueue Packet 17 

(d) Enqueue Packet and 

update OSW 17 


Table 5.1 Software Execution Times 
System 


cycles 

cycles 

cycles 

cycles 

cycles 

cycles 

cycles 

cycles 


cycles 

cycles 

cycles 

cycles 


= 1.32 (i Sec 

= 1.92 i* Sec 
= 0.84 |i Sec 

= 1.80 it Sec 
= 1.80 (i Sec 

= 1.32 i* Sec 

= 2.16 ii Sec 

= 2.28 |i Sec 

= 2.28 (i Sec 
= 2.08 (i Sec 

= 2.08 ii Sec 


for the Three Processor 


As stated earlier , the packet switch's maximum throughput 
is achieved when the processors are busy 100% of the time and 
when no output queue lists are empty. Therefore, in order to 
determine the maximum throughput, the slowest execution time 
must be selected from one of the following values: 

1) The execution time for the Input Service Routine 
under normal operating conditions. 

2) The execution time for the Packet Routing Routine 
when it enqueues a packet under normal operating 
conditions . 

3) The execution time for the Output Service Routine 
when it transmits a packet under normal operating 
conditions . 

Selecting and comparing the above values from Table 5.1, 
the execution time for the Output Processor is found to be 
the largest of the three values. Therefore, the three pro- 
cessor system has a maximum throughput which is limited by: 

F p < l/t p3 . (5.1) 

The system throughput in terms of bit rate is found by 
multiplying the maximum packet throughput by the packet bit 
length: 


Bx Fp = F b < B/t p3 


(5.2) 
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The system throughput in terms of bit rate is related to the 
number of users by: 

# 

F b - NxR . (5.3) 

This can be expressed as: 

Nxr < B/t p3 . (5.4) 

Or, 


t p3 < B/(NxR) . (5.5) 

In a heavily loaded system free from resource contention, 
the Output Processor services one packet every 1.92 micro- 
seconds. Therefore, the maximum packet throughput is: 

Fp < 1/1.92 ySeconds = 520,833 packets/second (5.6) 

If a packet length of 10,240 bits/packet is used, 
the maximum system bit rate is: 

F b = 10,240xF p = 5.3xl0 9 bits/second. (5.7) 

An important point to note is that the system is de- 
signed such that the processing time of each packet is inde- 
pendent of the packet size. Therefore, an increase in the 


170 


packet length will increase the system bit rate proportionally. 
However, due to the two internal serial transfers, a packet's 
delay is affected by the packet's size. An additional draw- 
back of overly large packet sizes is that a significant por- 
tion of a user's throughput is wasted when short messages are 
transmitted. Therefore, the system's throughput in terms of 
a bit rate may be quite large while the actual information 
rate could be small. All these points also hold true for the 
multiple processor architecture. 

5.1.2 Throughput Estimation for the Multiple Processor 
System 

The maximum throughput in bits/second of the multiple 
processor packet switch varies depending on the values of two 
parameters. These parameters are the packet size and the num- 
ber of processors implemented. In this section, the relation- 
ship between the throughput and the number of processors is 
presented. In order to evaluate this packet switch, new para- 
meters are needed. These new parameters are: 

1) c^ = Number of Processors in the Input Processor Class 

2) C 2 “ Number of Processors in the Packet Processor Class 

3) c, = Number of Processors in the Packet Routing Pro- 

cessor Class 

4) c^ = Number of Processors in the Output Processor Class 

5) C = Total Number of Processors 
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6) * Input Service Routine execution time 

7) tp 2 * Packet Sorting Routine execution time 

8) tp 3 * Packet Routing Routine execution time 

9) t p4 * Output Routine execution time 

10) F p . * Input Processor Class throughput in packets 

F 1 per second 

11) Fp - e Packet Sorting Processor Class throughput in 

c packets per second 

12) F_ - * Packet Routing Processor Class throughput in 

F J packets per second 

13) Fp * ■ Output Processor Class throughput in packets 

FC per second 

The maximum throughput of the switch is limited by the 
maximum throughput of the class of processors which has the 
smallest maximum throughput. The throughput of each class 
of processor depends on the software execution times and the 
number of processors assigned to each class. Therefore, the 
throughput for each processor class is: 

Fp_ < (l/t p )c., 1 < i < 4 (5.8) 

pc i p i 1 

In order to use this equation in the performance evalua- 
tion of the multiple processor packet switch, the software 
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execution times must be known. Table 5.2 contains the soft- 
ware execution times for each class of processor. Various 
values are listed since the • execution times of some routines 
vary depending on the current state of the system. As stated 
earlier, the packet switch's maximum throughput is achieved 
when the processors are busy 100% of the time and when no output 
queue list is empty. Therefore, the execution times used in 
this throughput estimation are: 

1) The execution time of the Input Routine when data 
from the ELI ST is available immediately. 

2) The execution time of the Packet Sorting Service 
Routine when the Packet Routing Data Port's DAC flag 
is set. 

3) The execution time of the Packet Routing Service 
Routine when it enqueues a packet under normal opera- 
ting conditions without updating an OSW. 

4) The execution time of the Output Service Routine when 
it transmits a packet under normal operating conditions. 

Using the data from Table 5.2 in equation 5.8, a table 
listing the throughputs as a function of the number of pro- 
cesses is constructed. Table 5.3 contains this data compiled 
from the evaluation. A graph displaying the relationship be- 
tween the number of processors and the upper bound on the 
system throughput is presented in Figure 5.1. This graph is 
plotted using the data contained in Table 5.3. 
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Normal Operation (No Memory Contention) * 


Input Service Routine 13 
Packet Sorting Service Routine 13 
Packet Routing Service Routine 


(a) Enqueue Packet 9 

(b) Enqueue Packet and 

Update OSW 9 

Output Service Routine 

(a) Transmit Packet 19 

(o) Empty Queue 7 

Wort Cast Due to Memory Contention: 

Input Service Routine 13 

Packet Sorting Service Routine 13 


Packet Routing Service Routine 


(a) Enqueue Packet 

(DEFAULT) 13 

(b) Enque Packet and 
Update OSW (DEFAULT) 13 

(c) Enqueue Packet 11 

(d) Enqueue Packet and 

Update OSW 11 

Output Service Routine 

(a) Transmit Packet 23 

(b) Empty Queue 11 


cycles 

cycles 

cycles 

cycles 

cycles 

cycles 

cycles 

cycles 


cycles 

cycles 

cycles 

cycles 

cycles 

cycles 


1.56 

1.56 

1.08 

1.08 

2.28 

0.84 

1.56 

1.56 


1.56 

1.56 

1.32 

1.32 

2.96 

1.32 


Table 5.2 Software Execution Times for the Multiple 
System 


It Sec 
It Sec 

it Sec 
Sec 

It Sec 
It Sec 

It Sec 
It Sec 

It Sec 

It Sec 
It Sec 

It Sec 

It Sec 
it Sec 

Processor 
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Throughput as a Function of the Number of Processors 


A specific example is pro?** t*d below to illustrate how 
the number of processors required to- a desired throughput is 
determined: 

Packet Length 

B - 10,240 bits per packet 
Desired Throughput 

Q 

P B < 30x10 bits per second 
F B /B ■ Fp < 3.0x10® packets per second 
The processor assignments are determined using equation 

5 . 8 . 

Number of Input Processors 

F Pcl “ 3 m - (1/t Pl )c l 

C. > (3x10® packets/sec) (1.56xl0“® seconds/packet/ 
processor) 

> 4.68 processors. 

Since must be an integer value, > 5 processors. 

Number of Packet Sorting Processors 
F Pc3 “ 3 MPS < (l/t p2 )C 2 

C 2 > (3x10® packets/ sec) (1.56xl0~® seconds/packet/ 
processor) 
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C 2 > 4.68 processors 


C 2 > 5 processors. 

Number of Packet Routing Processors 
p Pe3 >*<»*< UAjjlCj 

C, > (3x10® packets/sec) (1. 08*10”® seconds/packet/ 
processor) 

> 3.24 processors 

C 3 > 4 processors. 


Number of Output Processors 
P PC 4 * 3 MPS < <l/t p4 )C 4 

C 4 > (3x10® packets/sec) (2.28x10*® seconds/packet 
/processor) 

C 4 > 6.84 processors 

C 4 > 7 processors. 


There is an important point to note regarding the 
system throughput. As mentioned earlier, the system through- 
put depends on the packet size and the number of processors 
implemented. The important point of this relationship is 
that the number of processors that can be implemented is 
limited by the number of users. Each user is considered to 


( ' h P °°* W'r/J 
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have one input and one output buffer. If one ground station 
user is allocated two sets of buffers , he is viewed as two 
distinct users by the switch. The number of users limits the 
throughput because the number of Input, Packet Routing and 
Output Processors can never exceed the number of users. This 

m 

limitation arises since each user's workload cannot be effi- 
ciently divided among more than one processor of the same class. 
Therefore, the maximum attainable packet throughput for a fixed 
number of users is achieved when one processor from each class 
listed above is assigned to one user. As seen in Table 5.2, 
the Output function requires the longest execution time of 
the three classes listed above. As a result, this function 
limits the system's maximum attainable packet throughput as 
given by 


Pp < d/t p4 )N . (5.9) 

This equation, which expresses the relationship between 
the maximum throughput and the number of users, is plotted in 
the graph of Figure 5.2. The importance of this relationship 
is illustrated in the example given below. 

Desired System Features: 

N = 5 users 

B * 10,240 bits per packet 
Fg = 30*10^ bits per second 





Fig. 5.2 System Throughput as 


System Performance Evaluation Using Equation 5.9: 

F p < (1 packet/2.28 microseconds) • 5 
• * 

- 

Fp < 2.19xl0 6 packets per second 

Q 

F B ■ BxFp < 22.5x10 bits per second 

As seen by the results above, the system performance 
falls short of the desired goals. The system designer has 
three options available: 

1) Build the system and reduce each user's throughput to 
meet the lower performance rating. 

2) Increase the packet length. This solution faces the 
problems described in section 5.1.1. 

3) Assign the ground station users additional sets of 
buffers so that the packet switch serves more than 
five users. This solution allows additional pro- 
cessors to be implemented, which will increase the 
system's throughput rating. 

The purpose of the above example is not so much to ex- 
plain how to solve performance problems as to stress the 
importance of the last relationship presented in equation 5.9. 
Without this relationship, one would determine the number of 
processors required by referencing Figure 5.1. This obtained 
value may be impossible to implement due to the user/processor 
limitations . 
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A final point regarding the maximum obtainable through- 
put of the multiple processor system is that Equation 5.9 has 
a finite upperbound which is not solely limited by the number 
of users. As stated earlier, service for each packet requires 
a read and a write operation at ELIST. Therefore, ELIST will 
limit the maximum packet throughput of the packet switch. 

Using the hardware technology currently available, ELIST is 
designed to provide and accept address data approximately 
every 100 nanoseconds. This fact limits the system maximum 
attainable packet throughput as given by 


P p < (l/t p4 )N < (1/lOOxlo*" 9 ) 

Fp < (l/tp 4 > N < 10*10^ packets/second 


(5.10) 


A system using a packet length of 10,240 bits will have 
a maximum bit rate limited by 

Pp = B*Fp < (10,240 bits/packet) x (lOxio 6 packets/ 
second) 

F B < 102.4xio 9 bits/second. (5.11) 

As new and faster hardware and processor technology 
becomes available, the overall performance of this packet 
switch will improve. 
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5*2 Evaluation of the Processor 

Implementation of the packet switch may require the con- 
struction of a customised processor chip. Therefore, a review 
of the characteristics of the AMO 2903 ALU will provide the 
system designer with an insight into the design of a processor 
which is better tailored for this particular application. This 
review begins with the available features of the AMD 2903 ALU 
and ends with the features not provided by this chip that 
would enhance processor performance. 

The AMD 2903 ALU provides ample arithmetic and operations 
for the packet switch. In fact, the number of operations can 
be reduced to save hardware complexity. The only functions 
required are the addition operation, the logical AMD and the 
logical OR. The on-chip register file is ideal for holding 
scratchpad variables. In both multiprocessor designs, the 
full capacity of this file is never used. Therefore, this 
component could be reduced in size without degrading system 
performance. The single Q Register, which provides a work 
area for some operations, was quite adequate. The provided 
ZERO flag went unused and could be eliminated from the custom 
designed processor. 

There are several features the AMD 2903 ALU architecture 
does not support. These features would make the processor 
better suited for this particular application. They are: 

1) Internal tristate control of the DB Direct Data Input 
Bus. This bus is not currently tristate because this 
bus is bidirectional. This allows data to enter the 
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ALU from external hardware as well as allowing data 
from the register file to be sent directly to external 
hardware. Since direct transmission of data from the 
register file to external hardware is not required , 
this bus could be tri stated internally to save ex- 
ternal hardware. A possible alternative would be to 
increase the size of the internal select MUX. In this 
scheme , the DB input bus would no longer need to share 
the internal data bus with the register file. 

2) Additional Direct Data Inputs. These inputs save 
execution cycles since the processor does not need to 
generate a device's address before a read operation 
can be performed. These inputs can be used whenever 
the processor is required to access a single unique 
system device. Since the Data Path Busy Status Ports 
are unique system devices, this feature would reduce 
the software execution times for the Input and Output 
Processors in both architectures. This scheme may 
require larger internal Select MUXs and more select 
control signals. However, there does exist one way 
to increase the number of direct data inputs without 
increasing the Select MUX size or the number of con- 
trol lines. As mentioned earlier, only a small por- 
tion of the register file is used. In fact, the A- 
Register File is never used. Therefore, this component 
could be removed and its input to the Select MUX could 
be replaced with a direct data input. This particular 
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feature would increase system throughput directly and 
should^be considered an important design criterion. 

3) Internal data bus latches. This feature would provide 
for the stabilization of ALU data inputs without the 
use of external latches. 

All these features are recommended for any processor 
custom designed for the packet switches. 

5.3 Packet Losses 

If the throughput rating of the packet switch is exceeded, 
packets will be lost even when there are no hardware or soft- 
ware failures in the system. However, an important point to 
make concerning these packet losses is that the system will 
always recover at some point in time. In both architectures, 
packets can be lost due to overflow in three components. 
Overflow can take place in an additional component of the 
multiple processor system. The components which are suscep- 
tible to overflow are: 

1) The Input Buffers 

2) The Output Queue Lists 

3) ELIST 

4) The Packet Routing Data Ports' queues. 

Even with double buffering, an input buffer will over- 
flow if its user exceeds his allotted channel capacity. The 
oldest of the two packets residing in the input buffer will 
be lost as the new packet is shifted into the buffer. 


If any output queue list becomes full, the packet switch 
will encounter serious problems. When a queue list becomes 
full, the two index pointers will be equal in value. This is 
the same situation for an empty list. When the two pointers 
are equal, the Output Processor assumes the list is empty and 
does not access the list until new data is placed into the 
queue list. Therefore, the list remains full until new data 
is placed into the list, overwriting valid data. Only after 
overflow has occurred can the Output Processor access the 
list. Two serious problems arise from this overflow condition. 
The first problem is that once overflow takes place in the 
queue, no less than the entire list of original data will be 
lost. The second problem is a result of the first problem. 

As stated earlier, the data stored in the Output Queue Lists 
are the array addresses of routed packets. Therefore, if these 
addresses are lost, the routed packets will never be trans- 
mitted and they will remain in the Shift Register Array inde- 
finitely. Since they are never transmitted, their array 
addresses will never be returned to BLXST. This fact could 
cause ELIST to become empty. An empty ELIST and the asso- 
ciated problems of this situation are discussed next. 

If ELIST becomes empty and a new packet arrives at the 
input, the oldest packet in the shift register array will be 

t 

lost as the new packet is stored in its place. Packets will 
continue to be lost until the Output Processors return enough 
array addresses to ensure that the next shift register address 
fetched by an Input Processor is valid data. ELIST will 
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become empty when the system users exceed the packet switch's 
throughput rating. 

Znthe multiple processor design, if a Packet Destination 
Data list becomes empty, the system will face problems similar 
to those caused by a full Output Queue list. This is due to 
the fact that both lists share the same data structure. Again, 
packets will be trapped in the Shift Register Array because 
the data lost during overflow is needed for routing* If a 
packet is never routed, it can never leave the array. There 
is no way to re-sort these packets, which means the lost 
routing information can never be recovered. As with a full 
Output Queue list, the entire list of original data will be 
overwritten before the system can recover. 

Packet losses reduce the actual throughput of a system 
since users must retransmit all packets lost in transmission. 
Since a large and effective throughput is the primary goal of 
this work, care must be taken to ensure against packet losses. 
The system designer must research the queuing problems of the 
switch before deciding on the size of the Shift Register Array 
and all the various queue lists. If the packet switch is 
built with an insufficient amount of array locations and/or 
queue lengths for its throughput rating, packet losses will 
be inevitable. In addition, part of the responsibility of 
ensuring against packet losses belongs to the users themselves. 
They must not exceed the channel capacities assigned to them. 
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5.4 Fault Detection and Fault Tolerance 

Since the packet switches presented in this work are 
part of a proposed communication satellite network* fault 
detection and fault tolerance are desirable features. Once 
the satellite is placed into orbit* maintenance and repair 
work will be quite expensive or impossible. Therefore* if 
the packet switch could handle its own maintenance problems, 
the useful life of the satellite will be extended. 

The failure of some components will cause an entire 
channel to fail. An example of such a component is an input 
buffer. If an input buffer fails* the channel it serves will 
also fail. Some component failures will cause intermittent 
packet losses. An example of this type of failure would occur 
if one location in the Shift Register Array failed. Only the 
packets stored in this location would be lost or corrupted. 
Both of these types of failures will degrade system perform- 
ance but the packet switch can still operate. However* there 
are certain component failures which will cause the entire 
packet switch to fail. These components should be either 
fault tolerant through the use of redundant circuitry or self- 
diagnostic. The self-diagnostic components should be able to 
hand over their tasks to a spare component upon detection of 
a fault. The components which fall into this category for 
the three processor system are: 

1) The Input Processor 

2) The Routing Processor 

3) The Output Processor 
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4) All the polling circuits 

5) Both Data Path Busy Status Ports 

6) ELIST 


The components which can cause a channel loss in the 
three processor design due to a failure are: 


1 ) 

2 ) 

3) 

4) 


Input Buffers 
Output Queue Lists 


Output 

Output 
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The components which can cause intermittent packet losses 
in the three processor design due to a failure are: 


1) Data paths in the Input Switching Network 

2) Shift Register Array locations 

3) Data paths in the Output Switching Network 


In the multiple processor design, the only system com- 
ponent that may cause the entire packet switch to fail, should 
it fail, is the ELIST. Single or multiple channel failures 
could result if one of the following fails: 


1) Input Buffers 

2) Input Polling Circuits 

3) Input Processors 

f 

4) Data Path Busy Status Ports 

5) Packet Destination Data Ports 

6) Packet Routing Processors 
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4) All the polling circuits 

5) Both Data Path Busy Status Ports 

6) ELIST 


The components which can cause a channel loss in the 
three processor design due to a failure are: 


1) Input Buffers 

2) Output Qu 

3) Output S 

4) Output 
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The components which can cause intermittent packet losses 
in the three processor design due to a failure are: 


1) Data paths in the Input Switching Network 

2) Shift Register Array locations 

3) Data paths in the Output Switching Network 


In the multiple processor design, the only system com- 
ponent that may cause the entire packet switch to fail, should 
it fail, is the ELIST. Single or multiple channel failures 
could result if one of the following fails: 


1) Input Buffers 

2) Input Polling Circuits 

3) Input Processors 

f 

4) Data Path Busy Status Ports 

5) Packet Destination Data Ports 

6) Packet Routing Processors 


7) Output Queue Lists 

8) Output Processors 

9) Output Status Words 

10) Output Polling Circuits • 

11) Output Buffers 

As noted above* if the Data Path Busy Status Port of an 
Input or Output Switching Network fails, the loss of some 
channels will occur as a result. However, if only a single 
Input (Output) Switching Network is used by the switch (as in 
the case of the 4&ee processor system) , a status port failure 
will result in the failure of the entire packet switch. Thus, 
system reliability and elimination of resource contention is 
achieved with multiple Switching Networks. 

^ The components which can cause packet losses in the ^ 
multiple processor design due to a failure are: 

1) Data paths in the Input Switching Network 

2) Shift Register Array locations 

3) Shift Register Polling Circuits 

4) Packet Sorting Processors 

5) Data paths in the Output Switching Network 

Now that the impact of each component failure is identi- 

fied, « 2 system designer can decide What level of fattLt 

* 

detection and fault tolerance is needed for each component. 




6.0 QUEUE THEORETIC MODELLING FOR CALCULATION OF THE AVERAGE 
RESPONSE TIMES AND TUB AVERAGE QUEUE SIZES 

6.1 Introduction 

In this section queue theoretic analysis and evaluation 
of the proposed designs are presented. Analytical relationships 
between the average response times and the design parameters 
of the switch are obtained. These expressions are to be used 
to evaluate the performance of the three designs of the switch 
for various values of these parameters. Also, the average queue 
sizes in the shift register array are obtained. This queue 
size gives an idea as to the required size of these shift 
register arrays in the various designs. 

6.2 Design Parameters of the Switch 

The average response time of the switch and the average 
size of the shift register array depends on a number of para- 
meters. The more important of these are: 

1) ♦ ■ clock cycle time of the microprocessor - This 
speed determines the time taken by the processor to 
serve a packet at the various stages of its service. 

i*> 

2) tp^ ■ duration of the input interrupt service routine. 

3) tp 2 ■ duration of the output buffer interrupt service 

routine for packets. 

4) tpj * duration of the routing service routine. 

5) tp 4 ■ duration of the sorting service routine. 

6) R ■ bit rate/user. 

7) N » number of input lines connected to the switch. 

8) B » number of bits/packet. 
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t destination function - this function determines 


9) Yi 

the fraction of the total number of arriving packets 
going to individual output lines. 

10) = output line sp'eed - this speed determines the 
time required to transmit a packet to a particular 
destination. Different lines may have different 
speeds . 

11) F = system packet rate in packets/sec. 

r 

12) Fq = system throughput in bits/sec. 

13) M = number of output lines. 

14) K ■ number of packet size storage locations in the 

shift register array. 

15) * time taken for unsuccessful polling of one line 

at the i-th queue. 

16) A = overall average arrival rate (packets/sec.) . 

17) T = time needed to shift one bit internally. 

18) Nj , j=l,2,3,4 » number of processors at the input, 
output, routing and sorting service points respectively. 

6.3 The Single Processor Design 
6.3.1 Introduction 

It appears from the proposed single processor 
architecture and operation of the switch thtt queues build 

up in the switch as shown in Figure 6.1. In this queueing. 

# 

model packets queue for service by the processor in three 
places. Firstly, the arriving packets queue for inputting 
into the shift register array. Secondly, these packets await 
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the routing service which includes header analysis, error 
analysis, generation of Mac's and NACK's, and separating the 
packets into software queues. Finally, these packets queue 
for outputting. The routing service is to be performed 
by the processor whereas the inputting and the outputting 
functions involve service by a polling circuit in addition 
to that by the processor. Also, the inputting function has 
the highest priority, the outputting function has the second 
highest priority and the routing service has the lowest 
priority. This priority assignment is assumed as the incoming 
packets have to be attended to upon their arrival, otherwise 
they will be lost. Also, the output lines, being slower than 
the switch itself, causes a bottleneck in the system. Hence, 
whenever an output line is free to transmit messages, it 
should be serviced as quickly as possible. Thus, the outputting 
process is given the second highest priority. 

The packets change priority class after receiving service 
and the whole system can be modelled as a single server (the 
processor) serving customers of three levels of priority as 
shown in Figure 6.2. The packets of various priorities queue 
separately for service. The average time spent by a packet 
in the switch (average response time) is the sum of the 
waiting times and the service times at the three queues. Next, 
expressions are derived for the average waiting times, the 
overall average response time, and the average queue si’ies at 
the various queues. 



6.3.2 Parameters of the Input Queue (highest priority) 

(a) The Arrival Process 

The total arrival at the input queue is the 
sum of the arrivals on all the input lines. It is assumed 
that the arrival on the i-th input line is Poisson with average 
rate X^i* Then the overall arrival at the input queue is 
Poisson with arrival rate 

N A 

X, - l X.. s x (6.1) 

A i»l xx 

(b) Service Time 

The service time at this queue consists of 
polling time to locate the packet, transfer setting up time 
and the actual transfer time. However, the processor is 
free to service other lines as soon as a transfer is set up 
and also there are sufficient number of transfer paths 
available so that the actual process of transfer of any 
packet does not cause any delay in servicing any other packets. 
Thus, for the purpose of calculating the average waiting time 
for packets in this queue, we consider the service time 

T. • polling time + setting-up time 

A (6.2) 

= H + 

where t - is a constant. 

P 1 

We need the mean and the second moment of and, hence, 
those of t^. If there are N input lines, polled equally, 
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then a particular packet may be polled immediately or it may 
have to wait until N-l other lines are polled and the proba- 
bility of staring the scan at any one particular line is ~ . 
Thus# the average number of lines polled before the particular 
one is polled is 


N-l . 

Y - 

i-0 N 


N-l 


(6.3) 


and the average time spent for unsuccessful polling is 

where is the time taken for unsuccessful poll of 
one line. Also# the mean square value of the polling time is 


N-l (ix.) 

i 1 

i=0 


(N-l) (2N-1) t. 


N 


(6.4) 


Hence# the average service time 


ElT l ] " HT T 1 + *pl 


(6.5) 


and the mean square value of is 


, (N-l) (2N-1 )t: ~ 

eIT 2j i * 


( 6 . 6 ) 


(c) Utilization Factor 


N 


»1 * V E|T 1> * li )[ ¥ T 1 + ‘pi' 


(6.7) 
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€•3.3 Parameters of the Output Queue (Second highest 
priority) 

(a) The Arrival Process 

This queue* in fact* consists of M separate 
queues* one for each output line. A packet from this queue 
is serviced when the corresponding output buffer is empty. An 
empty output buffer produces an interrupt that is recognized 
by a polling circuit* and is serviced by the processor if there 
is a packet to be ui.ar»smitted in the corresponding output queue. 
If there is no packet in the corresponding output queue* then 
this interrupt is disabled until a packet is available. 

The time spent in this queue is calculated in two stages. 
Firstly* the time spent in waiting for and being serviced by 
the processor and secondly* the time spent in transferring and 
transmission of packets from the shift register array to the 
output lines. 

All the packets in all the output queues and the packets 
in the input queue affect the time spent by any packet waiting 
in any of the output queues for the processor. However* the 
time for transferring and transmission of a packet depends only 
on the speed of the corresponding output line because the pro- 
cessor can attend to other packets as soon as a transaction has 
been set up. Hence* to find the waiting time* we shall consider 
all transactions in the output queues to form one queue. It 
should be noted that it is the interrupts by the output buffers 
that are serviced by the processor. However, the interrupts 
are serviced only if there is a transaction available for 
transfer in the corresponding output queue. Thus* we are 
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assuming that the arrival of the interrupts follows the same 
distribution as the arrival of the packets to the output 
queues. This arrival process is, in fact, nonpoisson. How- 
ever, we shall assume it to be Poisson with the understanding 
that the results obtained are the worst case ones. The arrival 
rate is A 2 “ *1 * 

(b) Service Time 

The relevant service time for calculating the 
waiting time is 


polling time + setting time 


t 2 + fc p2 


(6.8) 


where t_ 0 is a constant.. 

p2 

The transfer time is not included here because it does not 
affect the waiting time for service by the processor. Following 
the arguments given in connection with the polling time for the 
input queue, it can be shown that the average service time 


and 


*1*21 ■ T 1 l 2 + fc p2 


E[T“) 


(M-l) (2M-1) T 


2 _ + t 2 
p2 


(6.9) 


( 6 . 10 ) 


O 


(c) The Utilization Factor 

The utilization factor connected with the 
service by the processor, for this queue is 


p 2 “ *2* E ^ T 2* 


( 6 . 11 ) 


197 


6.3.4 Parameters of the Queue for Routing Service 
(third highest priority) 

(a) The Arrival Process 

The arrival process is not exactly Poisson. 
However, for the purpose of this analysis, it is assumed to 
be Poisson with the understanding that the results obtained 
are the worst case ones. The arrival rate is Xj = = X. 

(b) Service Time 

The service time T.> = Polling Time + 

Processing Time 

■ t 3 + t p3 (6.12) 

where S>3 is a constant. Following the arguments given in 
connection with the input queue, it can be shown that 

®l T 3 1 “ 7T t 3 + *p3 (6.13) 

and 

E1T 2, . (K -l) i 2 K -l) t 2 + ^ (6.14) 

where K is the number of storage locations (in packets) in 
the shift register array and is the time spent in unsuccess- 
ful polling of a storage location. 

(c) Utilization Factor 

The utilization factor for this queue is 

P 3 = VE[T 3 ] (6.15) 
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6.3.5 Expression for the Average Response Tine 

Equations derived in the previous section are now 
used to obtain expressions fpr the response tine Of the switch. 

The queues use randon dispatching (polling) and pre-emptive 
queueing disciplines and do not give preference to packets with 
shorter service tines. As this dispatching discipline is 
independent of service time, the nean waiting times are the same 
as those for Head-of-Line service discipline. However, we take 
the polling function into account by adding the average time 
due to unsuccessful polling to the actual processing time by the 
processor. Then the average waiting time at the queue with the 
j-th priority is 17,8] 





(1 


1 

35T 


• Ji p *’ 


j-l Ji i 1 i’ 

E(T.) ( l p.) + 1 1 
J i=l 1 


11 - ! P L ] 

i=l 1 


(6.16) 


j = 1,2,3. 


The average of the total time spent by a packet in the input 
queue (highest priority) (time spent in waiting, being ser- 
viced by the processor and being transferred to the shift 
. register array from the input buffers) is 


E(tqi) * Elt^] + ElT^ + E(T tl ) 


(6.17) 


where is the transfer time at this queue 1. The average 
of the total time spent by a packet in the output queue (second 
highest priority) is calculated in the following way: 


199 


(a) Arrival Process 


This queue consists of M separate queues and the 
waiting time is different in the different queues as the waiting 
time in a queue depends on the arrival process and the speed of 
the corresponding output line. The arrival to each of the queues 
is assumed to be Poisson. However, the arrival rate may be dif- 
ferent for different queues. The arrival rate to the i-th 
component queue of this second priority queue is 

*2i “ Y i*2 “ Y i* (6.18) 

where is specified by the destination function such that 
Y £ of the total arrivals at this second priority output queue 
go to its i-th component queue. 

M 

Y* < 1 ; l Yi = 1 (6.19) 

1 i»l 1 

Hence, the average service time at the i-th component 
queue is 


E[T 2i J = E[t. t ,] + t^, + EfT^ ) 


*w2 


P2 


*2i 


“ E 'W + fc p2 + T t 


( 6 . 20 ) 


'2i 


where T* = transfer time, is a constant and t n0 the setting 
up time, is also a constant. E[Tt 2 ^] ■ average transfer time 
from the shift register array to the output buffer + average 
transmission time over the i-th output line 
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* average transmission time 


( 6 . 21 ) 



where is the transmission speed in bits/sec of the i-th 
output line. The utilization factor at the i-th component 
queue is 


Also 


P 2i - • 

BlT|i) * 4 + t* 2 

S i 


( 6 . 22 ) 


( 6 . 23 ) 


2 

neglecting the cross multiplication terms and E(t w2 ) as 
small. Then, the average time spent in waiting at the i-th 
component queue of the second priority queue is 


E[t 



- P 21 > 


( 6 . 24 ) 


Thus, the average total time spent in the i-th component 
queue of the second priority queue is 


E(t 



ElT 2i ) + 


*« - »k> 


( 6 . 25 ) 


The overall average time spent in waiting and in service at 
the second priority queue is 


E(t q2 ) 


M 

Jl 


2i 


Elt_ ) 
q 2i 


M 

X ' <®lt 1 

i«l <321 


( 6 . 26 ) 


The total average time spent by a packet in the queue for 
routing service (the third highest priority -sue) is 
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BI V " Bl W ♦ 


(6.37) 


Thus, the overall average response time ■ the average total 
time spent by a packet in the switch 


B IW - Bl V + Blt q2 1 + BI V 


(6.28) 


Putting back the expressions for the relevant quantities in 
equation (6.28) we get the overall average response time 


*tt q ] - E[t ql l ♦ Slt q2 l + Ett q3 ) 


Blt wl I + V + E,T tl> + Jj Y i 

B ^1 ( ! V (t p2 + f? ) 

+ » + J—± — i g 

S i 2(1 - V< E <t w 2> + V + 5I ,) 


+ ‘ P 2 


+ Blt w3 ) + S3 + 


K-l 


(6.29) 


neglecting Ett^l compared to t* 2 + , where E(t w ^) 

s i 

j «1,2,3 are given by equation (6.16). 

Equations (6.16) and (6.29) show the relationship of 
the average response time for the packets to the various 
design parameters of the switch, namely, the total arrival 
rate A, the number of input lines N, the size of storage at 
the shift register array K, the number of output lines N, 
packet size B, transmission rates of the output lines S^, the 
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processor tines t pl , t p2 end t p3 , the tines t^, t 2 , t 3 needed 
for unsuccessful polling of a packet at the first, second and 
third priority queues respectively, and y^* the destination 
function. This relationship can be used to study the effect 
of variation in any of these parameters on the average response 
tine. Zn this respect, it is useful to draw graphs shewing 
the variation in the average response tine as sons or all of 
these parameters are varied. Graphs of this type are presented 
in Figures 6.3 - 6.22. Further explanation of these graphs is 
presented in section 6.3.7. 


6.3.6 The Average Queue Sizes 

For this pre-emptive resume queue one can also obtain 
average queue sizes. The average number of packets waiting in 
the 3 -th queue is (7,8] 


B(Wj] 


1 

3=1 


(1- l Pa 
i -1 11 


j ; 1 

^ I pi* 

3 i -1 1 


2 

2(1 


,!, v "* 1 1 „ 


30) 


where ■ X 2 ■ X 3 • I * x ii * x » Pj/ P 2 » and P 3 are 9 iven 

2 

by equations (6.?), (6.11) and (6.15) respectively, and E(T^] , 
E('^) and FUj) are given by equations ( 6 . 6 ), (6.10) and (6.14) 
respectively. We are specifically interested in the queue size 
in the shift register array. This shift register array stores 
the packets that are waiting for the output function and the 
routing function. Hence, the required average queue size 
is B(W 2 ) + B(Wj) . A number of graphs showing the variation in 
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E(W. ) , j«l,2#3 have been obtained from equation (6.30) . These 
graphs are shown in Figure 6.23 - 6.29. Further explanation 
of these graphs is presented* in section 6.3.7. These graphs 
shoe the average queue sizes. However , we may be interested 
in finding queue size necessary for given utilization factor 
and probability of overflow. These results can be used to ob- 
tain an approximate answer to this question. Zf the utilization 

.3 

factor is about .6 and the probability of overflow is 10 , then 

the required buffer size is approximately ten times the average 
buffer occupancy. For smaller utilization factors , the required 
buffer size is further less (9) . 

6.3.7 Interpretation of the Graphs Showing the Effect of 
Various Design Parameters on the Performance of the 
Proposed Packet Switch 

A number of graphs showing the effect of the various 
design parameters on the average waiting times and the average 
queue sizes at the three queues and the overall average response 
time are presented in Figures 6.3 through 6.29. 

(a) The Average Waiting Times at the Three Queues 

Effect of A, K, M , 9 and K on the average waiting 
times at the three queues are shown in Figures 6.3 through 6.13. 
Average waiting time at queue 1 vs. A. N. t ^ and t p ^ . 

Figure 6.3 shows the effect of the utilization factor 
on E(t wl ) # the average waiting time at queue 1. E(t wl ) increases 
as increases and becomes very large as p^ approaches 1. The 
effect of A, N and tp^ on E(t wl ) can also be obtained from this 
graph by calculating the corresponding p^ using equations (6.1) 
through (6.7) and using this value of p^ in Figure 6.3. 


• Average waiting tlJpIt queue 2 vs. A. N. t ^« t t p2 and M . 

The effect of p 2 , the utilisation factor on &(t w2 ), the 
average waiting time at queue 2 is shown in Figure 6.4. Because 
the packets at the input queue (queue #1) has priority 
over those at the output queue (queue #2) , the E(t w2 ' depends 
on both and p 2 . The family of graphs in Figure 6.4 show the 
effect of p 2 on E(t w2 ) for a number of values of p^. It should 
be noted that p^ has a dominant effect on E(t w2 ) And for values 
of p^ close to l f E(t w2 ) increases rapidly. This indicates that 
when the input queue is heavily loaded, the processor does not 
have much time for the second queue. It is also observed from 
equations (6.8) through (6.11) that p 2 is related tc the number 
of input lines N, the arrival rate X, the polling time t 2 , the 
procesnor setting up time t p2 and the number of output, lit.es M. 
Hence, the effect of any of these parameters on E(t w2 ) can be 
obtained from Figure 6.4 by using the corresponding values of 
P 2 and pj. It can be seen from Equation (6.16) that E(t w2 ) con- 
tains a term t- . Hence, if p. + p- approaches 1, then 

\>2 A * 

Btt^) increases rapidly. Also, if p 1 + p 2 > 1, then 
may become negative. Thus, to have a reasonable value of 
B(t w2 ) • Pi + P 2 should be less than unity. 

Average waiting time at queue 3 vs. A f N, t ^ , t >>, t 2 , K and M . 

Figures 6.5 through 6.10 present the effect of p 3 , the 
utilization factor on E(t w3 ) # the average waiting time at queue 
3 for a number of values of p^, p 2 and K, the number of packet- 
size storage units in the shift register array. Figures 6.5 
through 6.7 show the effect of K on E(t w3 ) for same values of 
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P 1# P 2 and p 3 « It is seen from these graphs that for any given 
values of p 1# , p 2 and p 3 , (e.g., pj^ - .156, p 2 » .18 and p 3 = .343) 
E(t w3 ) is smaller for K = 10 .than for both K = 5 and K ■ 20. 

This indicates that for a given data arrival rate and processor 
speed, there is an optimum value of K that produces minimum 
£(^ 3 ). For values of K below this optimum value Eft^) increases 
as there may not be sufficient storage space available. Hence, 
the processor cannot immediately set up a transfer from the input 
buffer to the shift register array and thus the processor has to 
spend more than usual time for servicing each incoming input 
packet which, in turn, increases the delay in servicing the 
shift register array. This points to a possible tie-up situation 
and, hence, sufficient storage should be provided to avoid this 
breakdown of the process. On the other hand, as K increases, 
2 (t w3 ) increases simply because more time is spent in polling 
these storage units. 

Figures 6.7 through 6.10 show the effect of p^^ on E(t w3 ) 
for given values of p 2 , p 3 and K. These figures show that as 
p^ increases (with the same values of p 2 , P 3 and K) , E(t w3 > 
increases very rapidly indicating a dominating effect of p^ 
on Ett^) . Th'.s is because if the input queue is utilized 
heavily, then the processor does not get time to serve the 
second and the third queues giving rise to higher delay at 
these latter queues. 

It should be pointed out that E(t w3 > involves a term 

y, — - 1 -— — —r , (cf. equation (6.16)), and, hence, as p. + p 0 + 
(1“P 1 -P 2 "P3^ 1 2 

P 3 approaches unity, E (t^ 3 ) increases rapidly and if p^ + p 2 + 

P 3 > 1, then E(ty 3 ) ma^ be negative. Hence, p^ + p 2 + p 3 
should be kept less than unity. 
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Average waiting times vs. clock cycle time of processor . 

One of the objectives of this work has been to find out 
the effect of the speed of the microprocessor on the performance 
of the packet switch. For this purpose, graphs have been ob- 
tained showing the effect of <t>, the processor clock cycle time 
on E(t w i), E(t w2 ) and Eft^) as shown in Figures 6.11, 6.12 and 
6.13 respectively. 

Seven values of the clock cycle time, namely 0, 25 ns, 50 ns, 
75 ns, 100 ns, 125 ns and 150 ns have been considered. It is 
seen from these graphs that the clock cycle time has a prominent 

4 

effect on the waiting times. An arrival rate of X = 8x10 
packets/sec has been used in generating these graphs and the 
corresponding values of p^, p 2 and p 3 as obtained from equations 
(6.7), (6.11) and (6.15) respectively are also shown on these 
graphs. For the AMD 2900 bit slice microprocessor used in the 
present design, the clock cycle time is approximately 120 ns. 

The corresponding values of E(t w ^), E(t w2 ) E(t w j) are nS » 
1.7 yS and 11.5 yS respectively. 

In the future as more powerful microprocessors (with smaller 
clock cycle times) become available, the corresponding waiting 
times at the various queues can be obtained from these graphs. 
Other arrival rates also can be used in obtaining similar graphs 
provided that the corresponding P^ + P 2 + P 3 regains less than 
unity. 

(b) The Overall Average Response Time 

Effect of the various parameters on E{t g ) , the 
overall average response time is shown in Figures 6.14 through 
6 . 22 . 
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Overall average response time vs. packet size B . 

Figure 6; 14 shows the effect of the packet sise B on the 

overall average response time E(t ) . Four graphs each corre- 

* 

spending to a different set of (p^, Pj , P 3 ) are shovm. It is 
seen that in each case the overall average response time 
increases at the same moderate rate as B goes from 1000 bits 
to 10 #000 bits. This is a very useful result. Because the 
throughput of the switch increases directly as B, whereas the 
corresponding response time increases at a much slower rate. 
Thus, the throughput can be increased considerably without 
suffering severe penalty in response time. It is to be noted 
that p^, P 3 and p^ do not depend on B. It is the shifting times 
that depend on B. Hence, the response time for a given B can be 
reduced by employing a faster hardware for shifting of data. 

Overall average response time vs. destination function 

Figures 6.15 and 6.16 show the effect of destination 
functions on the overall average response time E(t ) . In figure 

4 

6.15, all output lines are assumed to have equal capacities. 
Also, five different sets of destination functions have been 
used. The destination function sets 1 and 2 represent random 
distribution of data to the various output lines. Set 3 repre- 
sents uniform distribution of data to the output lines. The 
fourth set is such that half of all the data go to the output 
line number 1. The output lines 2, 3, 4 and 5 receive only ten 
p< rcent of the data each. The rest of the lines receive only 
two percent of the data. This is a biased destination function. 
The fifth set again represents a biased destination function 
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with the output line number 2 receiving fifty percent of the 
data. The capacities of all output lines are the same. It is 
observed from Figure 6.15 that the overall average response 
t i m e is minimum for the uniform destination function. Also, 
for the biased destination functions, the response times are con- 
siderably higher than that for the uniform destination function 
case. The input arrival rate is chosen such that the utilisation 
factor for each of the output lines is less than unity. 

For Figure 6.16 the same sets of destination functions 
and same values of other parameters are used except that in 
this case the capacities of the output lines are given by 

= 5XBy^. Here, the capacity of each output line is propor- 
tional to the amount of data destined for it. Because of this, 
the response time remains constant for all the destination 
functions. 

Overall average response time vs. output line speeds S^ . 

Figures 6.17 through 6.22 show the variation of the over- 
all average response time due to changes in the capacities of 
the output lines. Three types of capacity assignments are con- 
sidered: uniform, proportional and square root. In the uni- 

form capacity assignment, the capacities of all the output 

ABot 

lines are the same (S^ ** -jj— ) . In the proportional assignment, 
each output line is given capacity proportional to the traffic 
on it (S^ * XBy^a ) . In the square root capacity assignment, 
every line is assigned minimum capacity equal to the traffic 
expected on this line. Additional capacities are then assigned 
to each line in proportion to the square root of the traffic 
expected on that line. Figures 6.17 through 6.19 show the 
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response time for uniform destination functions (y^ * .1 for 
all i). With this destination function, identical response 
times are obtained for all three types of capacity assignments 
as shown in Figures 6.17 through 6.19. This is so because 
with this destination function all three capacity assignments 
result in the same capacity values for the output lines. The 
case when a « 1, i.e. , the capacity assignment is equal to the 
average traffic on a line, the response time is undefined as 
the one or more terms in equation (6.29) may be negative. It 
is observed from these graphs that the response time decreases 
as a increases, the decrease being sharper initially and more 
sluggish for o > 5. Thus, after certain values of a, increasing 
the line capacities may not reduce the response time corre- 
spondingly. That means a point of diminishing return sets in. 

These general comments apply to Figures 6.20 through 6.22 
also. However, for these cases, the destination function is a 
biased one and, hence, the response time does not have the 
exact same value for the three different capacity assignment 
strategies . 

(c) The Effect of the Various Design Parameters 
on the Average Queue Sizes ------------ 

The number of packets waiting at the various 

queues for various design parameters is shown in Figures 6.23 

through 6.29. 

Average queue sizes vs. X, N, M, K, t ^ , t 2 , t ^ , t t p o and fc p 3 
Figure 6.23 shows the variation in the average queue size 
E(w^) with p the utilization factor at queue 1. This curve 
has similarity with that for E(t wl > . This follows from Little's 


formula which states that the average queue size » average 
arrival rate x average time spent in the system. As Pj 
approaches unity# the queue size increases rapidly. However# 
as the queue 1 has the highest priority# the queue size is 
rather small for p < .9. 

Figure 6.24 shows the average queue size E(w 2 ) as a 
function of p 2 , the utilization factor at queue 2 for a number 
of values of p^. For reasonable results p^ + p 2 should be less 
than unity. It is also seen from this figure that p^ has a 
dominant effect on E(w 2 ) . 

Figures 6.25 through 6.29 show how E(w 3 ) # the queue size 
at the third queue changes with p^# p 2 # p 3 and K. Figures 6.25 
and 6.26 show E(w 3 ) for K * 10 and K « 50 respectively for given 
values of p^# p 2 and p 3 . It is seen that the expected queue 
size B(w 3 ) goes up somewhat for K « 50 than for K « 10. This 
is due to the additional polling time necessary for finding the 
stored packets. It appears that K * 10 is reasonable for p = .1. 
However# it is seen from Figures 6.26 through 6.29 that E(w 3 ) 
increases rather quickly as p^ increases. Hence, for higher 
values of p^# a larger value of K should be used and the corre- 
sponding queue size be determined. For the purpose of this 
report# K ■ 50 is used and the corresponding E(w 3 ) are shown. 

If a higher value of is intended to be used# then a K larger 
than 50 has to be used. 


€.4 The Three Processor Design 

6.4.1 Introduction 

It appears from the proposed three processor archi- 
tecture and operation of the switch that queues build up in 
the switch as shown in Figure 6.30. In this queueing model, 
packets queue for service by the processors in three places. 
Firstly, the arriving packets queue for inputting into the 
shift register array. Secondly, these packets await the 
routing service which includes header analysis, error analy- 
sis, and separating the packets into software output queues. 
Finally, these packets queue for outputting. All the packet 
switch functions involve service by polling circuits in addi- 
tion to processor service. 

The average time spent by a packet in the switch (average 
response time) is the sum of the waiting times and the service 
times at the three queues. Next, expressions are derived for 
the average waiting times, the average response times, and the 
average queue sizes at the various queues. 

6.4.2 Expressions for the Waiting Times at the Various 
Queues and the Overall Average Response Time 

The assumptions made for the queueing model for the 
single processor design are also assumed here. Also, the 
analytical developments used in section 6.3 are valid 
here except that in the three processor design, each processor 
is performing only one function. Hence, the average waiting 
time at each queue depends on the corresponding utilization 


factor only. Thus, the average waiting times at the routing 
and the output queues depend on and p 2 respectively and 
not on other p's. 

Following the definitions and analytical developments 
similar to those for the single processor design (cf. section 
6.3)# it can be shown that for the three processor design# the 
overall average response time E(t ) is 

M 


Blt q ] 


Ett ql ] + E(t q2 1 + E|t q3 ] 

+t pl + EtT tl ] + X Y i [ E[t »2> * V 

Y i‘X X lJ ,(t p2 + ? ) 

+ 

1 2(1 - Y^IEtt^) + t p2 + |^)) 


+ + *p3 


(6.31) 


2 2 v 

neglecting E [ t^ 2 ] compared to t p2 + “j • where E(t w j) 

S 1 

j=l,2,3, the average waiting times at the j-th queue 
are given by [7,8] 


IE IT?) 

E'V ~ 2 T 1 ' - p 7 ' )ml ' 2 ' 3 (6 ' 32) 

2 

where E[Tj], j=l,2,3 and p^, j=l,2,3 are given by equations 
(6.6), (6.10), (6.14) and (6.7), (6.11) and (6.15) respec- 
tively. The difference between equations (6.16) and (6.32) 
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should be noted. Also, the polling times t^, t 2 and t 3 
are assumed to be negligible as polling at all three queues 
are done by hardware in this case. 

Equations (6.31) and (6.32) show the relationship of the 
average response time for the packets to the various design 
parameters of the switch, namely, the total arrival rate A, 
the number of input lines N, the number of output lines M, 
packet size B, transmission rates of the output lines S^, the 
processor times t^, t p2 and t p3 , and y^, the destination 
function. This relationship can be used to study the effect 
of variation in any of these parameters on the average response 
time. In this respect, it is useful to draw graphs showing 
the variation in the average response time as some or all of 
these parameters are varied. Some graphs of this type are 
presented in Figures 6.31 - 6.55. 

6.4.3 Expressions for the Average Queue Sizes at the 
Various Queues 

Following the developments in section 6.3.6 for 
the average queue sizes for the single processor design, it 
can be shown that for the three processor design the average 
number of packets waiting at the j-th queue [7,8] is 

A^E [T^] 

E [W • ] = p. + - J , j=l, 2, 3 (6.33) 

3 3 2 (1-p j) 

2 

where E[Tj], j=l,2,3 and p y j-1,2,3 are given by equations 
(6.6), (6.10), (6.14) and (6.7), (6.11) and (6.15) respec- 
tively. 
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Me are specifically interested in the queue sise in 

the shift register array. This shift register array stores 

the packets that are waiting for the output function and the 

routing function. Hence# the required average queue sise is 

E(W 2 1 + EtWj] . A number of graphs showing the variation in 

B(Wj) # j*l#2,3 have been obtained from equation (6.33). These 

graphs are shown in Figures 6.56-6.64. Further explanation 

of these graphs is presented in section 6.4.4. These graphs 

show the average queue sizes. However# we may be interested 

in finding queue size necessary for given utilization factor 

and probability of overflow. These results can be used to 

obtain an approximate answer to this question. If the utiliza- 

.3 

tion factor is about .6 and the probability of overflow is 10 # 

then the required buffer size is approximately ten times the 
average buffer occupancy. For smaller utilization factors# 
the required buffer size is further less [9) . 

6.4.4 Interpretation of the Graphs Showing the Effect 

of the Various Design Parameters on the Performance 
of the Proposed Three Processor Packet Switch 

(a) Effect of Contention on the Average Waiting 

Times and the Average Queue Sizes at the Various 
Queues 

In the three processor design, the problem of 
contention among the processors for using common resources has 
been resolved as much as possible. However, possible contention 
over the use of the output queue lists by the routing and the 
output processors could not be totally removed. It appears 
from Table 5.1 that the durations of the routing service 
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routines and the output service routine increase by two cycle 
tines each in the presence of contention over those in the 
absence of contention. Early on we wanted to find out the 
effect of contention on the average waiting times and the 
average queue sizes at the three queues. The graphs in Figures 
6.31 through 6,41 show the effect of contention on the average 
waiting tines and the average queue sizes. An examination and 
comparison of the corresponding graphs with and without conten- 
tion show that the effect of contention on the average waiting 
times and the average queue sizes at the routing and output queues 
are negligible. The input queue , of course, is not affected by 
contention. For this evaluation, two possible situations have 
been considered! no contention and contention at all times. 

The corresponding results give the lower and upper bound on the 
effect of contention. Results for other degrees of contention 
lie in between these two limits. 

(b) The Average Waiting Times and the Response Times 
at the Three Queues 

Figure 6.42 shows the effect of the utilization 

factors p^, p 2 and p 3 on the corresponding average waiting 

times. The average waiting times increase as the corresponding 

p increases. For values of p beyond .8, the waiting times 

become very high and these go to infinity for p equal to unity. 

Actual values of these waiting times for a given value of p 

2 2 

differs due to the difference in the values of E[T^), ElTjl 
2 

and E[Tj) which happens due to the difference in the values of 
tpi» tp 2 and t p3 as noted on Figure 6.42. Figure 6.43 shows 
similar effects on the average response times at the three 
queues . 
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Average waiting tinea vs. clock cycle tine of processor . 

(toe of the objectives of this work has been to find out 
the effect of the speed of the microprocessors on the performance 
of the packet switch. For this purpose, graphs have been ob- 
tained showing the effect of $, the processor clock cycle time 
on B(t w2 ) and E(t^ 2 ) as shown in Figures 6.44, 6.45 and 

6.46 respectively. 

Eleven values of the clock cycle time have been considered. 
The corresponding values of the respective utilization factors 
are shown on these graphs. It is seen from these graphs that 
the clock cycle time has a prominent effect on the waiting times. 

4 

An arrival rate of X * 8x10 packets/sec has been used in 
generating these graphs and the corresponding values of p^, p 2 
and p^ as Obtained from equations (6.7), (6.11) and (6.15) 
respectively are also shown on these graphs. For the AMD 2900 
bit slice microprocessor used in the present design, the clock 
cycle time is approximately 120 ns. The corresponding values 
of Btt^) , £(t w2 ) and B(T w2 ) are 80 ns, 150 ns and 166 ns 
respectively. 

In the future, as more powerful microprocessors (with 
smaller clock cycle times) become available, the corresponding 
waiting times at the various queues can be obtained from these 
graphs. Other arrival rates also can be used in obtaining 
similar graphs provided that the corresponding p*s remain less 
than unity. 

(c) The Overall Average Response Time 

Effect of the various parameters on E(t ) , the 
overall average response time, is shown in Figures 6.47 through 


Overall average raiponw time vi. packet size B 

Figure 6.47 shows the effect of the packet size B on 
the overall average response time E(t^) • Four graphs each 
corresponding to a different set of (p^» p 2 * P 3 ) are shown. 

Zt is seen that in each case the overall average response time 
increases at the same moderate rate as B goes from 1000 bits to 
10 #000 bits. This is a very useful result. Becai'se the through- 
put of the switch increases directly as B whereas the corre- 
sponding response time increases at a much slower rate. Thus 
the throughput can be increasea considerably without suffering 
severe penalty in response time. It is to be noted that p^* p 2 
and* p 2 do not depend on B, It is the shifting times that depend 
on B. Bence* the response time for a given B can be reduced by 
employing a faster hardware for shifting of data. 

Overall average response time vs. destination function y^ . 

Figures 6.48 and 6.49 show the effect of destination func- 
tions on the overall average response time B(t ). In Figure 

M 

6.48 all output lines are assumed to have equal capacities. 

Also* five different sets of destination functions have been 
used. The destination function sets 1 and 2 represent random 
distribution of data to the various output lines. Set 3 repre- 
sents uniform distribution of data to the output lines. The 
fourth set is such that half of all the data go to the output 
line number 1. The output lines 2* 3* 4 and 5 receive only 
ten percent of the data each. The rest of the lines receive 
only two percent of the 2a ta. This is a biased destination 
function. The fifth set again represents a biased destination 
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function with the output line number 2 receiving fifty percent 
of the data. It is obser&d from Figure 6.48 that the overall 
average response time is minimum for the uniform destination 
function. AlsB for the biased destination functions the 
response time is considerably higher than that for the uniform 
destination function case. The input arrival rate is chosen 
such that the utilization factor for each of the output lines 
is less than unity. 

For Figure 6.49 the same sets of destination functions 
and same values of other parameters are used except that in 
this case the capacities of the output lines are given by 
* 5 XBy^. Here the capacity of each output line is 
proportional to the amount of data destined for it. Because 
of this, the response time remains constant for all the 
destination functions. 

Overall average response time vs. output line speeds 

Figures 6.50 through 6.55 show the variation of the overall 
average response time due to changes in the capacities of the 
output lines. Three types of capacity assignments are 
considered: uniform, proportional and square root. In the 

uniform capacity assignment the capacities of all the output 
lines are the same (8^ ■ -^) . In the proportional assignment 
each output line is given capacity proportional to the traffic 
expected on it (S^ ■ XBY^a) . In the square root capacity 
assignment every line is assigned minimum capacity equal to 
the traffic expected on this line. Additional capacities 
are then assigned to each line in proportion to the square 
root of the traffic expected on that line. Figures 6.50 
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through 6.52 show the response time for uniform destination 
Q functions (Y^ “ .1 for all i) . With this destination function 

identical response times are obtained for all three types 
of capacity assignments as shown in Figures 6.50 through 6.52. 
'Jttiis is so because with this destination function all three 
capacity assignments result in the same capacity values for 
the output lines. The case when a ■ 1, i.e., the capacity 
assignment is equal to the average traffic on a line, the 
response time is undefined as the one or more terms in 
equation 6.29 may be negative. Hence the values of response 
time for 2 < a < 10 are shown in these graphs. It i„; 
observed from these graphs that the response time decreases 
as a increases, the decrease being sharper initially and 
more sluggish for a > 5. Thus after certain values of a 
increasing the line capacities may not reduce the response 
time correspondingly. That means a point of diminishing 
return sets in. 

These general comments apply to Figures 6.53 through 6.55 
also. However, for these cases the destination function is 
a biased one and hence the response time does not have the 
exact same value for the three different capacity assignment 
strategies. 

(d) The Effect of the Various Design Parameters 
on the Average Queue Sizes 

The number of packets waiting at the various 
queues for various design parameters is shown in Figures 
6.56 through 6.64. 
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Figure 6.56 shows the variation in the average queue 
size Efw,) with p^, the utilization factor at queue 1. 

This curve has similarity with that for E(t w ^) . This 
follows from Little's formula which states that the average 
queue size = average arrival rate x average time spent in 
the system. As approaches unity the queue size increases 
rapidly. However, the queue size is rather small for p < .9. 
Similar comments also apply to Figures 6.57 and 6.58 which 
show the variation of average queue sizes at the routing and 
the output queues respectively. 

Figures 6.59 and 6.60 show the effect of varying M, the 
number of output lines, on the average queue sizes at the 
output queue for two proportional capacity assignments to 
these lines . It follows from these graphs that even with 
proportional capacity assignment the output queue size 
increases with the number of output lines . This increase 
is mainly due to the work involved in demultiplexing data to 
so many lines which may or may not be ready to receive data. 

It is seen from Figure 6.61 and 6.62 that the average 
queue size at the output queue does not increase much with 
increase in the packet size. This is an encouraging result 
as the throughput can be increased by increasing packet size 
without making the corresponding storage requirements too 
high. 

Figures 6.63 and 6.64 show that the queue size at the 
output queue cannot be decreased much by using faster 


processors. This is mainly because at the output queue 
major part of the service time is due to shifting time and 
many packets wait for the output buffers to be available 
rather than for service by the processor itself. It also 
appears from a comparison of Figures 6.63 and 6.64 that 
increasing the capacities of the output lines make the 
queue size to go down considerably. 

6.5 The Multiple Processor Design 

6.5.1 Introduction 

In the multiple processor architecture queues 
build up in the switch as shown in Figure 6.65. In this 
queueing model every packet queue for service by appropriate 
processors in four places. Firstly an incoming packet queue 
for service by one of the input processors for inputting 
into the shift register array. Secondly, this packet awaits 
service by one of the sorting processors that assigns it to 
one of the routing processors. The routing processor services 
it by putting it into one of the output queues. Lastly this 
packet is serviced by one of the output processors. Each of 
these services involve service by appropriate processors and 
polling circuits. However, the hardware polling times are 
negligible. 

It is assumed that at every stage of the service, e.g. 
at the input service, the total number of packets arriving 
there for service are equally divided among the processors 
performing that function. This assumption is physically 
reasonable as this will ensure that all the processors are 
equally busy. Operation of the multiple processor design 
indicates that at each stage of service there are a number of 

single server queues in parallel. 
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The average time spent by a packet in the switch 
(average response time) is the sum of the waiting times and 

the service times at the four queues . 

« 

Analytical expressions are derived next for the average 
waiting times# the average response times and the average queue 
sizes at the various queues. 

6.5.2 Analytical Expressions for the Waiting Times 
at the Various Queues and the Overall Average 
Response Time 

Assumptions made for the queueing model for the 
single processor design are assumed here. Also the 
analytical developments used in section 6.3 are valid here 
except 

i) there is no interdependence among the functions as each 
function is performed by a number of processors dedicated 
for this function. 

ii) The packet arrival rate to each processor assigned for 
the j-th function is *j/ N j where is number of 
processors performing this function an is the 
overall packet arrival rate for this service. In 
normal operation X^=X for j=l,2#3#4. 

It should be noted that 

j=l ■* input function 
j=2 -*■ output function 
j=3 -*■ routing function 
and j=4 sorting function 

Following the analytical developments similar to those for 
the single and three processor designs# it can be shown that 
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the average waiting times at the j-th queue are given by [7,81 . 
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2 2 
where E(T^ ] * mean square value of the service time * t ^ 


and Pj = 
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(6.35) 

(6.36) 


(neglecting the polling times) . 

For the queueing analysis the following values of the 
service times have been used. 


15 <p 
19 <t> 
9 <P 
t p4 = 13 + 


"pl 

b p2 

b p3 


(6.37) 


These values differ slightly from the values shown in table 5.2. 
The values in table 5.2 are the final refined values obtained 
after the queueing models have been developed using the earlier 
estimates of these quantities. However, the queueing results 
will not be much different using the values in table 5.2. 

The average response times at these queues are 
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E( V ■ E 'W + S>3 * E,t «3 ) + 9 ♦ ( *• 40, 

E(t q4> " E(t w4> + V * E,t «4 > + 13 ♦ <«•«> 

E(tq 2 ) is obtained by following the development in section 6.3.5. 
The overall average response time of the switch is 

4 

E(t q > = J B(t q ^) (6.42) 

where E(t J; j=l, 2, 3, 4 are given by equations (6.38) through 
(6.41) respectively. 

Equations (6.34) through (6.42 show the relationship of 
tile average waiting and response times for the packets to the 
various design parameters of the switch # namely# the total 
arrival rate X, the number of input lines N# the number of 
output lines M# packet size B, transmission rates of the output 
lines S^# the processor times tpl' *P2 and fc p3 andt p4' Y i' 
the destination function and Nj the number of processors at 
the various queues. These relationships can be used to study 
the effect of variation in any of these parameters on the 
performance of the switch. In this respect# it is useful to 
draw graphs showing the variation in the average waiting and 
response times as some or all of these parameters are varied. 
Some graphs of this type are presented in Figures 6.66 - 6.79. 
The aim here is to see how the waiting times and response 
times vary as the number of processors at every service stage 
is varied. Hence these Figures show family of graphs with 

as a parameter. The effect of variation of other parameter 
should be similar to that shown for the single and three 
processor designs. 
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6.5.3 Expressions for '.he Average Queue Sizes at the 
Various Queues 

Following the developments in section 6.4.3 for 
the average queue sizes for. the three processor design, it 
can be shown that for the multiple processor design the 
average number of packets waiting at the j-th queue [7,8] is 


B[Wj] 

j 


x «5T> 2 ‘pi 

p j + 2(1- Dj " Na *Pj + 2(1 - il t ) 1 

* ' N j Pj 

1,2, 3,4. (6.43) 


tie are specifically interested in the queue size in 
the shift register array. This shift register array stores 
the packets that are waiting for the output, sorting and the 
routing functions. Hence, the required average queue size is 
B[W 2 ) + E[W 3 ] + E(W 4 ) . A number of graphs showing the variation 
in E(Wj) , j*l,2,3,4 have been obtained from equation (6.43). 
These graphs are shown in Figures 6.80 - 6.83. Further 
explanation of these graphs is presented in section 6.5.4. 

These graphs show the average queue sizes. However, we may 
be interested in finding queue size necessary for given 
utilization factor and probability of overflow. These results 
can be used to obtain an approximate answer to this question. 

If the utilization factor is about .6 and the probability of 

overflow is 10“ , then the required buffer size is approximately 
ten times the average buffer occupancy. For smaller utilization 
factors, the required buffer size is further less [9]. 
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6.5.4 Interpretation of the Graphs Showing the Effect 

of the Various Design Parameters on the Performance 
of the Proposed Multiple Processor Packet Switch 

Major aim of the analysis is to see how the 

average waiting times at the various queues vary for given 

overall arrival rate as the number of processors at these queues 

are varied. Figures 6.66 - 6.68 show the effect of varying 

the number input processors on the average waiting time at the 

input queue. These graphs also show the effect on the average 

waiting time of varying the overall packet arrival rate for a 

given number of input processors. These three figures differ 

in the maximum value of X, the packet arrival rate that is 

6 7 

allowed. Maximum packet arrival rates of 2x10 , 2x10 and 

7 

5x10 packets/sec have been used in Figures 6.66 - 6.68 

respectively. The rationale for using these three maximum 

values of X is the following: For ^ max = 2xl0 6 one can 

observe clearly how the average waiting time varies for a 

single input processor. However, the effect is not at all 

clear for other higher number of input processors. The 

7 7 

using of X mav = 2x10 and 5x10 shows the effect on average 

waiting time of the varying the number of input processors. 

For the same reason three values of X ma% , have also been 

max 

used for the sorting, the routing and the output queues. 

Figures 6.69 - 6.71 show the effect of varying the 
overall packet arrival rate on the average waiting time 
at the output queue. These graphs also show the effect 
on the average waiting time of varying the number of output 
processors. Similar results are shown in Figures 6.72 - 6.74 
and Figures 6.75 - 6.77 for the routing and the sorting 
queues respectively. 
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Figure 6.78 shows the effect of varying B, the packet 
size on the overall average response time for a fixed number 
of processors. The overall average response time increases 
slightly as B increases. 

Figure 6.79 show the effect of destination functions on 
the overall average response time E(t_). In this figure all 
output lines are assumed to have equal capacities. Also 
five different sets of destination functions have been used. 
The destination function sets 1 and 2 represent random 
distribution of data to the various output lines. Set 3 
represents uniform distribution of data to the output lines. 
The fourth set is such that half of all the data go to the 
output line number 1. The output lines 2,. 3, 4 and 5 receive 
only ten percent of the data each. The rest of the lines 
receive only two percent of the data. This is a biased 
destination function. The fifth set again represents a 
biased destination function with the output line number 2 
receiving fifty percent of the data. The capacities of all 
output lines are the same. 

It is observed from Figure 6.79 that in the case of 
multiple processor design the E(t ) is almost constant for 

4 

all the sets of destination functions. One explanation 
is that in the case of multiple processor design any output 
line with more packets destined for it may be provided 
with a dedicated processor. Also since the output lines are 
slower than the processor no large queue will build up, of 
course the capacity of the output lines should be high enough 
to absorb the packets destined for them. It is to be noted 
that in Figure 6.79, the ratio of maximum packet arrival rate 
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to the line capacity is .5 M XB/XB8 ■ Thus the channel 

8 

capacity is large enough to handle the packet arrival rates 
for even the line with destination function of .5. 

Thus it is seen that E(t_) is almost constant for all 
sets of destination functions. 

Finally figures 6.80 - 6.83 show the effect of variation 
in the number of processors on the average queue sizes for a 
given packet arrival rate at the input* output* routing and 
the sorting queues respectively. These Figures also show 
the effect on the average queue sizes of varying the packet 
arrival rate for a fixed number of processors at the corresponding 
queues. It should be noted that the queue sizes decrease as 
the number of processors increase at the various queues. 
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6.6 Conclusions 

Queue theoretic models have been developed for all the 
three proposed architectures. Graphs showing the average 
waiting times , the overall average response times and average 
queue sizes as functions of various design parameters have 
been obtained. It is observed from these graphs that in most 
cases the average waiting times and the average queue sizes 
are reasonable. The overall response times and the queue 
sizes are much smaller in the three processors case than in 
the single processor case. These quantities a: a further 
reduced in the multiple processor case, however, not propor- 
tionately. 

The main incentive for using multiple processors is to 
increase the throughput. However, the response times and the 
queue sizes (the storage requirement) are also reduced in the 
process. Thus it seems that the multiple processor design is 
the one to be used. 
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7.0 Summary 

7.1 Suggestions for Future Work 

• Several suggestions for future work in the area of 
processor-controlled packet * switches are presented in (2)* 

An additional feature which is possible in the multiprocessor 
architectures is the transmission of system status data to 
each user. This scheme would require an additional processor 
which would be required to monitor the system status. This 
processor could monitor the status of ELI ST# the Output Queue 
Lists# and important system hardware. If this processor dis- 
covered a hardware failure# a near empty ELIST or a nearly 
filled queue list it could generate a packet-length message 
that would inform the user of the system problems. This 
processor would be required to inform the Output Processor to 
send a system status data packet to each user. Using the 
received status information# user could re-route messages 
around nonfunctioning channels# reduce their overall through- 
put# or reduce their throughput to a specific user to avoid 
packet losses . 

Any system enhancements will be paid for in terms of 
throughput and/or the number of required processors. 

7.2 System Throughput 

All three packet switch architectures are capable of handling 
large system throughputs as shown in the following examples. 


7.2.1 Single Processor Packet Switch 
Fp < 1.5*10 5 packets/sec. 

Using a packet length of 10,240 bits, the maximum bit 
rate for the system is 

P B “ p p xB < < 1 * 5x1 ° 5 > * (10,240) bits/sec. 

P B < 1.54*10* bits/sec. 

7.2.2 Three Processor Pack Switch 
Fp < 5.21x10* packets/sec 

Using the packet length of 10,240 bits, the bit rate for 
this system is 

F n " P *B < (5.21x10*) x (10,240) bits/second 
b p 

F B ■ 5.33x10* bits/second 

7.2.3 Multiple Processor Packet Switch 
Example System 

Fg < 30x10* bits/second 

N * 10 users 
B » 10,240 bits/second 

Packet Throughput Requirement for this system 
Fp ■ Fg/B < (30xl0*/10,240) packets/second 
Fp < 2.93x10* packets/second 
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Since the throughput is limited by Equation 5.9, which 
states the packet throughput is limited by the number of users, 
the value calculated above may not be obtainable for this 
system An evaluation must be made. 

6 7 6 
F p < 2.93*10* < (l/t p4 )N - 4.39*10* 

is true, the proposed system can be built to handle the desired 
bit rate. By using Equation 5.8 for each class of processors, 
the total number of processors required for this system is 
determined. Twenty-one processors are needed: five Inpur Pro- 

cessors, five Sorting Processors, Four Routing Processors and 
seven Output Processors. This system using twenty-one proces- 

Q 

sors will provide a bit rate of 30*10 bits/second. As shown 
above in the evaluation using Equation 5.9, this throughput is 
not the maximum obtainable bit rate. Thus, if additional pro- 
cessors were implemented, a larger throughput could be provided 
to the ten users. 

The cost o*! achieving these large throughputs is paid for 
in terms of the number of proce^^rs required, the width of 
the Mictoprogram ROM and the special purpose hardware and soft- 
ware required to deal with contention problems. T.te major 
trade-off in both designs is that a reduction in the software 
executions is paid for i% hardware complexity. Two prime 
examples of this type of trade-off are the use of hardware 
pollers and the large number of microprogram control bits, 
which enable the execution of concurrent tasks. 
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7.3 Queue Theoretic Results 

Queue theoretic models have been developed for all the 
three proposed architectures. Graphs showing the average 
waiting times , the overall average response times and average 
queue sizes as functions of various design parameters have 
been obtained. It is observed from these graphs that in most 
cases the average waiting times and the average queue sizes 
are reasonable. The overall response times and the queue 
sizes are much smaller in the three processors case than in 
the single processor case. These quantities are further 
reduced in the multiple processor case, however, not propor- 
tionately. 

The main incentive for using multiple processors is to 
increase the throughput. However, the response times and the 
queue sizes (the storage requirement) are also reduced in the 
process. Thus it seems that the multiple processor design is 
the one to be used. 

The major contribution of this work to the area of digital 
communications is the design of efficient multiprocessor packet 
switches which can provide large throughputs, special functions 
and flexibility not available in non -programmable systems. The 
overall performance of these packet switches will improve as 
faster hardware and processors become available. 
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Legend : 


Fig. 6.2. The Modified Queuing 
Model . 



The input queue with 
priority 1. 



The output queue with 
priority 2. 



The queue for background 
service with priority 3. 
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Average Waiting Time Vs. Utilization Factor at Queue 
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Average Waiting Time Vs. Utilisation Factor 
As Parameters. 
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Overall Average Response Time Vs. Destination Functions 
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Overall Average Response 
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Fig. 6.20. Overall Average Response Time Vs. o in S 
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Overall Average Response 
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Overall Average Response 









Average Queue Size Vs. Utilization Factor 





Average Queue Size Vs. Utilization Factor At 
Paraaeters . 
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Fig. 6.30. The Queueing Model for 

the Three Processor Design 






Average Waiting Tine Va. Utilization Factor At The Input Queue 
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Waiting Time Vs. Utilization Factor At The Routing Queue 
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Utilization Factor At The Routing Queue 
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Fig. 6.36. Average Queue Size Vs. Utilization Factor At The Input And The Routing Queues 
(No Contention) . 
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Average Waiting Time Vs. Clock Cycle Time Of The Processor At The Input Queue 
(For The Proposed Design 4«120ns) . 















Fig. 6.40. Average Waiting Time Vs. Clock Cycle Time Of The Processor At The Output Queue 
(No Contention For the Proposed Design +«120ns) . 



Waiting Time Vs. Clock Cycle Time Of The Processor At The Output Queue 
:ion At All Times* For The Proposed Design 4=120ns) . 
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Waiting Time Vs. Utilization Factor At the Input , Output and 





Overall Average Response Tine Vs. Utilization Factors at the Input 
Output and Routing Queues. 
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Average Waiting Time Vs. Clock Cycle Time of the Processor at the Input 







Fig. 6.45. Average Waiting Time Vs. Clock Cycle Time of the Processor at the 
Output Queue. 





Average Waiting Time Vs. Clock Cycle Tine of the Processor at the 
Routing Queue. 










Destination Functions 







Fig. 6.51. Overall Average Responae Tiae ?«. « in S. ■ IBv. 
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Overall Average Response 







Overall Average Response 








Fig. 6.58. Average Queue Size Vs. Utilization Factor At The Output Queue 






Number of output lines 

:e At The Output Queue Vs» The Number of Output Lines 







Average Queue Size At The Output Queue Vs. The Number of Output Lines 









Clock Cycle Tlae 



























Average Waiting Tine Vs. Packet Arrival Rate At The Input Queue 
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Average Malting Tine Vs. Packet Arrival Rate At Th>.? Input Queue 






Average Waiting Time Vs. Packet Arrival Rate At The Input Queue 



Average Waiting Time Vs. Packet Arrival Rate At The Output Queue 
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Fig. 6.70. Average Waiting Time Vs. Packet Arrival Rate At The Output Queue 





V 



Average Waiting Time Vs. Packet Arrival Rate At The Output Queue 







Average Waiting Time Vs. racket Arrival Rate At The Routing Queue 
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Pig. 6.75. Average Waiting Time Vs. Packet Arrival Rate At The Sorting Queue 
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Fig. 6.76. Average Waiting Time Vs. Packet Arrival Rate At The Sorting Queue 



Average Waiting Time Vs. Packet Arrival Rate At The Sorting Queue 





Fig. 6.78. Overall Average Waiting Time Vs. Packet Size 
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Fig. €.79. Overall Avevage Waiting Tine Vs. Destination Functions 





Average Queue Size Vs. Packet Arrival Rate At The Output Queue 
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APPENDIX A 

INPUT SERVICE ROUTINE MICROCODE 
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APPENDIX B 

PROCESSOR-CONTROLLED ELIST 
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Processor-Controlled ELI ST Architecture. 
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ELI ST INPUT DATA Ports 





Fig. B3. ELIST RAM Structure 
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Fig. B5. ELI ST Service Routine Flowchart 








ELI ST i If DAV - 0, JMP to DAC 


•Is there a Shift Register num- 
ber to input? NO: Jump to DAC 
Routine. 

[Data Port)§Input Polling Circuit - Q 

•YES; Input the data from the 
•elected port. 

If DAC"0, JMP to STORE; send a DAC; Release Input Port Poller 

•Sand a DAC to the input port and 
clear the Input Port Poller. 
Meanwhile , check to see if any 
output port requires new data. 

If none do, jump to the STORE 
Routine. 

ELIST OUTPUT PORT BASE ADDRESS * ADDRESS LATCH 

0 * Selected Output Port 


•If a port requires data, enable 
it onto the data bus and send 
the data. 

SEND A DAV; RELEASE Output Port Poller; JMP to FLIST 
DAC: If DAC-0, JMP to ELIST 


*Is there an output port request- 
ing service? NO: JMP to ELIST 
Routine. 


ELIST BASE ADDRESS - ADDRESS LATCH 
[ELIST] 6EPTR - Q 


•YES: Fetch a S.R.* from ELIST 

ELIST OUTPUT PORT BASE ADDRESS-ADDRESS LATCH; Decrement EPTR 
Q - Selected Output Port 

•SEND DATA and update EPTR 

SEND A DAV; RELEASE OUTPUT PORT POLLER; JMP to ELIST 

STORE: ELIST BASE ADDRESS - ADDRESS LATCH; Increment EPTR 
Q - [ELIST] 6EPTR; JMP to ELIST 

•STORE the data in ELIST 
Fig. B6. ELIST SERVICE ROUTINE 
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ELIST Service Routine 


a) S.R.# available as well as requested 

b) S.R.# required from RAM 

c) S.R.# stored in RAM 

d) No data available or required 


6 cycles * 0.72 ySec 

7 cycles = 0.84 ySec 
5 cycles = 0.60 ySec 
2 cycles * 0.24 ySec 


Fig. B7. Software Execution Times 
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